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ABSTRACT 


This study began with the paren loan cal postulate that all 
human performance in a choice situation tends to be made on a 
systematic basis. 

In the setting of multiple choice achievement tests, this 


postulate resolved itself into three operational hypotheses which form 
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ufficient set to establish the possibility that it 
applied to both the right and the wrong answers given by examinees. 
Testing these three hypotheses involved the following procedures: 

1) @o develop and logically validate a systematic method for 
the construction of the foils (distractors) on a multiple 
choice achievement test designed to measure higher mental 
processes. 

2) fo show. that the construct validity of this systematic 
method held up reasonably well in the results of the 
administration of this test. 

3) To show that the validly produced foils in this context 
improved the predictive validity of this test with respect 
to other achievement tests over the more usuai procedures. 

The results of the study tended in general to support these 

three hypotheses fairly strongly if we take into account the finding 
that many of the foils could be classified into more than one category 
as evident in the low interrater reliability, the need to reclassify 
foils when wrong answer patterns were being interpreted, and the manner 
in which these interpreted foil categories cross-validated. This study 


would seem to have produced three fairly definite findings: 
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Human performance, when abstracted from responses to multiple 
choice achievement tests involving higher mental processes, 
would seem to be systematic, and to display evidence of 
multiple interpretation of the communication. 

ge. There would seem to be .a hierarchy of foils which parallels 
the hierarchy of right answers and which influences the way 
in which: G€ach.total, item performs... The levels. ofthe foils 
themselves seem to depend upon the ways in which this 
totality of each item is approached. 

3. Wrong answers contain potentially useful information with 
respect to achievement when higher mental processes are 
involved. 

Taxonomic tests would seem to have a number of properties not 
assumed to be present when the test is sufficiently homogeneous to be 
assumed to form a scale. The existence of these properties made it 
fairly evident that the more commonly used analytic procedures were 
probably inappropriate for the analysis.and interpretation of the results 
from, and the criteria for, evaluating the effectiveness of this type of 
test. The altermative analytic procedure which the findings of the study 
implied were organized into a suggestion for the extension of test theory 
designed to deal with the problems which seem to arise where taxonomic 
tests are concerned. 

Some implications of the findings to educational practice were 
drawn, and a number of suggestions for future research into this area 


were presented. 
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CHAPTER I 
INTRODUCTION 


Psychological literature has been replete with studies of choice 
behavior. Chown (1959), Duncan (1959) and more recently Hunt (1961) and 
Berlyne (1965) have reviewed this literature adequately. From these 
studies it is fairly evident that the distribution of choices made by 
humans in problem-solving situations tends to exhibit some systematic 
trends. 

These trends, however, have often been confined to discussion in 
terms of the patterns of "success" when compared with the nature and 
complexity of the task. Many studies, for example Strutz (1966) con- 
centrate mainly on "right" answers and the relevant patterns involved. 

A notable second direction in this area has been the attempts to 
describe the nature of the procedure used by the person in his attempt 
to solve problems; for instance, Piaget (1953)* proposed a logical 
system for these procedures. Abelson and Rosenberg (1958) proposed a 
logical system which they call i Fcoctodoes cl which proposes a set of 
"logical" procedures which lead to certain types of "wrong" answers. 
specifically, within the area of achievement testing the 
multiple choice test provides a good opportunity to observe choice 
behavior in a problem-solving setting. With the advent of Bloom's 
Taxonomy (1956) a considerable improvement in the classification of 
test items involving these "higher mental processes" became possible. 


This present study will be to 1) demonstrate the presence or 


teurth (1969) discusses the Piaget model in some detail. 
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absence of this additional information, 2) identify the general 
Propertics, 2i any, Of this information, and 3) speculate as to the 
Implvcations of Such findings, eee systematic choice behavior, 
consistencies would be expected within and between individuals for all 
choices made. These consistencies may not be confined to the 
"successes" or "right" answers. This study proposes to explore 

certain aspects of the possibility that the choices made among wrong 
answers may also be systematic and therefore contain useful information 


for the examiner, 


Statement of the Problem 

some aspects of the possibility that wrong answers are selected 
systematically have been examined /Ce Fouldes and Forbes, 1965; Powell, 
1968; Jacobs and Vandeventer, 1968; Powell and Isbister, 1969/, and these 
studies are discussed in more detail in Chapter II. 

The specific concern of this study is to show to the designers 
of tests the significance of wrong-answer information. Any significant 
improvement in a test must be reflected a a corresponding improvement 
in the validity of the test. This study proposes to examine the 
construct and predictive validities of a particular method of test 
construction, The purpose of this study, then, is to explore the 
possibility that, if tests are constructed in a particular manner, 
"wrong" answers may add to the examiner's information about the 
examinee. This present study will be content to demonstrate the 
presence or absence of this additional information to determine the 
major properties of this information, and to speculate as to the 


implications of such findings. 









= “y ak « o f 
7 f a hit ba y 7 ' = ite (Ss » My vf 4 qa" es 5 i ve f by 5 
‘ 
7 » 
if ny ; ee 4 y | r 
& » f 4 j , f e: » 
‘ 
7 ae . ’ 
’ a? “f 
s 4 
. i 
over Dj f 
E - - ¥ ‘ * 
Tle. xor wlewery ii 
i 
i P : 
ae by . 
j )* aa * rere} ote 
wetre = p " ‘ “Ax " ay mod 


eres tebe. ¢ ne 4 ; : 114 Sipeg » subi 


hae EA Sit aah ica hh 1 Sete: <xiu lll 


2 7 , 

SS Pes a 44 a oa a ae? : a oy gt ooh 
t Mies 

>, . ; 3 , . 7 oy 


’ J * - . » f « ” 
£ j 
i : y 2 oy 7 
. . . oe 





= i. T + a » en Lae : a _ 
'. ‘m7 etolgxe o3 it Ae a ie ons «ray feats 


ve _ \ 
on wafrotttead 2 1 beinihetones 44> wiact.bi .tunr v 


= haa me Saree rr Sawrtoldt euedhete of) oF bie es SER tes * 















way ¢ ot taal aos a? ey DR Cw Yui vefaeta er ins ante 
7 = a Me’ a i 7 1 ; - 

 baliaerc Pre 
ois tel peak ‘ee seunaiae net g 


iene " 


fC ia he 











CHAPTER IT 
BACKGROUND FOR ‘THE STUDY 


As already mentioned in Chapter I, this study was essentially 
exploratory. For this reason, little attempt has been made to 
establish a theoretical rationale upon which the hypothetical structure 
of the study might be built. Instead, the study was organized on the 
basis of procedural considerations. The absence of a theoretical 
rationale for the interpretation of results had the advantage that the 
data could be examined for consistent characteristics and the 
properties of these eee ee ae 

Of course the nature of the procedures employed provide 
definite limitations upon the interpretations. The findings themselves 
would tend to provide other limitations upon the interpretation, and 
also the generalizability of the findings. 

It is the purpose of the present chapter to review the most 
Significant research which is relevant to the present study in order to 
present the research background which forms the basis for the 


procedural considerations which were employed. 


The Problem of Measuring Academic Performance 

The functions of, and therefore the outcomes of, education are 
a subject of debate which is beyond the scope of this dissertation., 
However, these functions and their corresponding outcomes have an 


important bearing on the nature of and the interpretations given to 


eaear and Strauss (4967) developed a complete rationale to 
justify this procedure for exploratory studies. 
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the results of the various kinds of measuring instruments used. 

Tests are formalized communications between examiner and 
examinee. The examiner is attempting ys obtain a controlled sample of 
behavior to assist him in the rendering of certain judgments. These 
judgments may be either value judgments (continue, withdraw, certificate); 
or they may be procedural (concerning the nature of the appropriate treat- 
ments of programmes). The greatest possible control of the behavior 
sample is found in the multiple choice test. 

As communications, tests involve several important considerations. 
First, they involve the examiner's perception of the examinees in terms 
of the capabilities they do have, and the capabilities they should have 
(educational goals). These perceptions lead to the examiner's decisions 
as to which information to give in the examination, in what format, and 
which information to withhoid. 

second, on the basis of these considerations a communication is 
formulated. For’ the purpose of this study these communications will be 
confined to the multiple choice achievement test. 

Third, the communication is presented to the examinee who is 
expected sto Gnterpret at and to “réspond “to 1t>'> Hewill ‘do’ soon’ the 
basis of the capabilities he possesses; the information he possesses, 
particularly that part of the information pool which was withheld by the 
examiner; and his sense of the importance to him, as a person, of the 
answers he gives including the information about himself which he wishes 
to ole ° 

Fourth, the examiner then has the task of interpreting the 
responses of the examinees and making such value or procedural judg- 


ments as may be appropriate to these interpretations and to the 
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purpose of the test. To form nee judgments the examinee's performance 
can be compared, as appropriate, with 9) his own past performance in 
similar contexts, 2) the performance of others in the same context 
(norm referencing) or 3) some external behavioral definition of mastery 
(criterion referencing). 

However, where the subject matter content is itself open to 
disagreement, examiners themselves may not agree as to the appropriate- 
ness of the communication or its interpretation, Also, the examiner's 
assumptions about the capabilities and information background of the 
examinees may not*be congruent to their actual characteristics. Further- 
more, there may be little similarity between the examiner's purposes 
and the examinee's interpretation of these purposes, In saneion, it: an: 
examinee has systematically misclassified a particular concept and this 
concept recurs with a high degree of frequency in a test, the examinee 
eonlikely tofobtaines Low total-correct score. Given the opportunity to 
Gorrect this mistlassification could lead-to a much higher total score. 
How serious, then, must a misclassification be considered? Finally, 
suppose that the examiner misclassifies? Such an event is bound to have 
an adverse effect on the total correct score of the profoundly informed 
student as Hoffman (1962) points out. The combined effects of such 
considerations upon the composition of total-correct scores complicates 


their interpretation. 


Technical Consideration in the Measurement of Choice Behavior 
This study will confine itself to the choice behavior of 
examinees as exemplified in their responses to multiple choice achieve- 


ment tests. The particular point of view to be expressed is relative to 


the way in which current practice tends to use wrong-answer information. 
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Current practice for scoring achievement tests. Present 
.practice for scoring multiple choice achievement tests is to count the 
number of "right" answers selected by the examinee on a test or a 
subtest. The "right" answers are usually predetermined although 
experience with particular items may lead to subsequent revisions. 

In such tests the examinee is faced with several altermatives only one 

of which is "right." This means that he can make a wrong choice among 

several alternatives. In general, however, distinctions which might be 
made among students on the basis of differences among the wrong answers 
selected are not considered when the students! scores are evaluated. 

If wrong answers are used for any purpose it is usually to correct the 


scores for guessing. 


There are three general areas in which the specific 
characteristics of a test can be improved. These are: 

atchekita bichisy 

2h Was diay 

Jam USeabada uy 

The third characteristic of these, useability, can be dispensed 
with quickly because the simplicity of administration, and the 
simplicity and objectivity of scoring of multiple choice tests and the 
ease in the establishment and use of norms has been well established. 


The other two characteristics require more discussion. 


The reliability of a test. The concept of test reliability 


ee ee! 


involves how well the test measures whatever it measures. The APA 


Standards (1966) lists three methods of estimating reliability. 
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1. Internal consistency 

2. Reliability between Rabie 

3. “Reliability over time 

The latter two are determined by correlation coefficients either 
between alternative forms, or between repeated administrations of the 
same form on the same group cf examinees. Neither of these two 
approaches is directly Saplaeebie to this study. 

There are several possible approaches to the study of internal 
consistency. These are: 

1. Item analysis 

2. Using a correction-for-guessing 

3. Designing the test to form a scale 

4, Examining internal characteristics of the test 


5. Part- whole comparisons 


internal consistency from item analysis. There are two schools 
of thought with respect to item analysis. The classical approach 
assumes that all of the items on the test should measure the same over- 
all characteristic that the whole test, or that the relevant subtest, 
measures. To determine this similarity, the distribution of right 
answers on a particular item is correlated with the distribution of 
total-correct scores. The biserial correlation coefficient is usually 
used. This correlation coefficient actually ee a dichotomous 
variable with a continuous variable. For the answers of a multiple 
choice item to be dichotomous, the plural set of wrong answers must be 


treated as a single variable. Similarly, the discrete distribution of 
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total scores (the total scores are the sum of a binary vector of "right- 
wrong" decisions which sum must be a whole number ) must be tes as a 
continuous variable. The problem ae score data provide discrete ° 
distribution.is avoided by assuming that total scores are "best 
estimates" of "true scores" and true scores are assumed to form a 
continuous distribution. There is, of course, a multiserial correlation 
ice Jaspen, 1946/7 which could be used to take account of the plurality 
of "wrong" answers. This latter coefficient is rarely used to evaluate 
multiple choice test items. Classical test theory advocates that the 
biserial correlation coefficient for each item should be high 
(significantly different from zero). 

An alternative approach suggested by Lord (1952) ee the 
Gonsideralions mecessary for addition, of scores. In.order to add: two 
numbers, they must be independent, that is, the set of lattice points 
each represents can share no elements in common. By this approach, 
individual items; should be relatively uncorrelated, but should 
collectively form a scale. 

Both of these procedures tend to treat a multiple choice item 
as a dichotomy thus overlooking the fact that more than one choice can 


be made among the set of foils. 


Using @ correction-lor-suessing. Originally, guessing 
corrections for scoring formulae assumed that the number of right 
answers-.which are attributable to guessing were directly related to the 
number of wrong answers given and inversely related to the number of 
alternatives per item. Because any answer could be a "guess," no 


meaning could be ascribed to particular answers. Meaning was thus 
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assumed to be confined to some form of cumulative score. This 
correction has the effect of increasing the variance of the total scores 
because a greater amount is subtracted from the low scores than from the 
high ones. 

With respect to corrections-for-guessing, Gupta and Penfold 
(1961) showed that the guessing correction over-corrects in the event 
that the examinee is responding on the basis of misinformation. A 
Similar argument can be presented to suggest that this correction under- 
corrects the partially-informed examinee. More recently, Shuford, and 
Massengill (1965) elaborated upon a system of "confidence scoring" in 
which the examinee rates every alternative on the basis of his con- 
fidence that each particular alternative is right. Honesty is 
encouraged on the basis that "confidently wrong" loses marks. This 
procedure makes it possible to classify each examinee's answer to each 
question as 1) well informed, 2) partially informed, 3) uninformed, 
and 4) misinformed. This procedure solves the guessing correction 
problem by identifying which items were -"'guessed" thus increasing the 
interpretability of particular items and hence the validity of the test. 
The scoring method these authors developed increases the internal con- 
sistency of the test by increasing the true score variance estimates 


proportionally more than the total score variance. 


that the practice of distinguishing among individuals on the basis of 
total scores without considering the constituents of those scores may 
produce information loss. This argument can lead to the proliferation 


‘of subtests, or it can lead to test designs in which the scores form a 
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10 
scale. For instance, Cox and Graham (1966) propose a system for. 
designing a test which uses Gagné's task analysis ToL Gagne, 1965, 
Chapter Vit/; to produce a Guttmann (1954) scale, In this case, the 
score may indicate the level of mastery. Again, the internal 
consistency of the test can be increased, this time by increasing 


item homogeneity. 


Examining internal test characteristics. One other area which 
has led to improvements in the reliability of tests has been through 
research into the improvement of the definition of the variables being 
measured by a test. Research toward this objective has been more 
extensive in the area of personality tests than in the development of 
achievement tests. The design of personality tests is beyond the scope 
of this present study. In view of the scarcity of appropriate research 
from the achievement testing area, only two developments in this latter 
testing area will be discussed nee First, Ayers (1965) attempted to 
validate Bloom's Taxonomy by means of factor analysis from tetrachoric 
correlations using programmed instruction in order to control the 
teacher variable. His findings in general supported Bloom's notion of 
a hierarchical structure. However, the results did not consistently 
fit the classification system in the Taxonomy. A more ambitious study 
to this same end was conducted by Kropp, Stoker, and Bashaw (1966). 
Although their findings were similar to those of Ayers (1965) because 
of the illumination their study provides for the construction of 
taxonomic tests, it is discussed in detail on page 18ff. For our 
present purposes, it may be sufficient to say that the validity and 


the reliability of achievement tests may be improved by using Bloom's 
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Taxonomy as a guide for developing the items. 

Second, Gupta (1968) showed that the reliability in an internal 
consistency sense of an achievement test can be improved if the test is 
subdivided into subtests based on factor analytic results or on the 
basis of the DuBois, Loevinger and Gleser (1952) method of cluster 
analysis. This procedure makes subtests from relatively homogeneous 
items. This present study used a similar approach. 

It should be noted, once again, that these methods tend to 


concentrate exclusively on the "right" answers. 


Reliability, based on, part- whole comparisons. A special case 


of the alternative forms method of determining the reliability of the 
test is the group of procedures which use the correlation of one part 
of the test with another. The mathematical limit of the repeated use 
of the split-half technique when certain assumptions are made is found 
in the Kuder-Richardson (K-R) formulae. It is this form of reliability 
which increases in the DuBois et al (1952) procedure. 

The Kuder-Richardson procedure is most sensitive to differences 
in phe variance, or the test., For this reason, if error variance is 
kept constant, increasing the test variance (as when using a 
correction-for-guessing), also increases the reliability. “Another 
method of increasing the variance is to rewrite the test in such a man- 
ner as to move the difficulty (selection ratio) of each item toward .5 
(50%). If the item is a dichotomy, a difficulty of .5 (50%) tended to 
maximize the variance assuming positive correlation because it 
maximizes the probability of choosing either alternative. 


It is a common suggestion in evaluation texts for example, that 
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items should have middle range of difficulty. This suggestion assumes 
that each item should be treated as a "right-wrong" dichotomy. The 
plurality of wrong answers is being overlooked when items are treated 


as a dichotomy. 


Current Practices for Evaluating Achievement Tests--Validity 

im addition, to the reliability of a test it is also necessary 
to be sure that a test measures the things it is intended to measure, 
i.e. the validity of the test. The validity and the reliability are 
related in that the validity of a test can never be higher than the 
square root of the test's reliability when the latter is defined in 
repeated-measures terms, hence the efforts to increase test reliability. 

The APA Standards lists three types of validity. These are: 

de Content validity 

Pe SCONS ULUCia vali iy 

3. Criterion-related validity 

The concept of content validity refers to the validity of an 
item or test as dependent upon the BD ihe staat OL ite Atem or Less 
of the information background needed to answer the test. In this 


present study most of the necessary information background needed is 


supplied in reading selections embedded in the test. 


The construct validity aspect of a test. Construct validity 
has several gaspecis.s. Imebriet) rcconstructevalidity metersrto seme 
psychological construct or constructs, more or less independent of 
content) which tare ineludédyin the test. jror instance; af intelligence 
as a construct is assumed to be manifested by intelligence measures, 


such as the WISC, the correlations between a new group 1.Q. test and 
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the WISC scores on the same subjects could be support for the construct 
validity of the new test. In this case the construct would be 
Witrelilugence., “" Anouner approaen to construct walidity isto define 
ime eonstrucy on such terme ae to facilitate translation into perform- 
ance terms. A good example to this approach is Bloom's Taxonomy (1956). 
As already andieated, this procedure should increase the reliability of 
the test as well as its validity. These constructs should also be 
identifiable in the performance of examinees when the performance data 
aré subjected tO statistical analysis. 

pmother procedure which strengthens support Tor constructs for, 
which measures have not been standardized is Neross-validation," Tn 
cross-validation statistical analysis should reveal the same constructs 


in independent groups. Cross-validation is used in this study for 
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testing the constauch validity of the procedure being exp 


the development of foils (wrong alternatives) on multiple choice tests 

A final Aspect of construct validity concerns the degree to 
which the examiner's objectives have been accomplished by the test he 
has developed. In the absence of wena nas this accomplishment is 
difficult’ to measure.” ‘One approach is to ‘study the’ distribution of 
anewere tovan-ivem to Set elues’ to its effectiveness.” “Part “oi the 
discussion in Chapter III on the development of the experimental test 
ised in this study will elaborate thes procedure. 

In some cases the construct may be sufficiently well defined 
that the different performance outcomes are indisputable. In such 
cases the construct validity of a test may be easily determined. 
Piaget's discussion of the acquisition of various aspects of 


conservation concepts are a case in point. Items measuring the 


2 | ; 
Er , a hoe cal > 7 - 
fort stiie ‘att io Shopan at Aico at Fay Ans gies sn es 













ing eitamee) Faeyr Meee) weg’ Pee oe be sal Gay wre 
Yh ee tae foe Rel ey af Btiorreeiere qe fae ee 


“le ity Seedy: “ot ya} 
; 5 val a 
As > oo) pw itet Hoes ee eae ey : 


. : = 
efor teat OG AT atc teh tote ei é b 


few Tt <7 " “i igre fenaate e+ oe ih. 46a8- 


i f = rary : 
i rok } Vi LilDw ga wy 
° J 4 ¥ ad We 


19 rs j 
? - : : ry ” oe - * 
Whi wow so taut aie 
i i! pay i ; vi 
: ‘ cal Bae) ca 


s : t a 
~o Apdeiw 
bay 5 Pe 
{ of. .a¥T-eaoT> 
: é e 35 ee 
werel af 
al a 2 J ey ee im 
se ee laa 
f 
. mn.) . tokio teea df 


s c 1 if 1 ; : Bos pe : Poets ; : 
ag “0 rho e oh ek J j ; i a iSsarc ee Oe: ay swand 


Se6> Set tenisiexs et! bo tikeaqioret: of «9 SS -cofeeih) ae eee 





fp 13 


oy A a ere * «lading at bie!-oluesdare i fe hie «att 0S Be 


p tts ged yes k Fain: ef tam tyartiedon act tee “ania lle 
wv, 4 _ a 7 a 


7 . a 
7 ar a J ; 

































} or } 


i 7 : <» ‘i 7 ™ ~ Pon a3 . 
— a ¢ oy, aant ioe , 
—. SU Me eye 16 - : 





14 


acquisition of these concepts must conform in their discrimination to 
the known characteristics of this acquisition process. Where wrong 
answers are concerned, as will be ne on page 27 ff no such clear 
definition exists. The present study, therefore, can be no more than 


exploratory in nature. 


Criterion-oriented aspects of a test. One of the fundamental 
functions of any measurement of achievement is its predictive value for 
future achievement. Within the context of the present study one of the 
concems is the ability of the experimental test which may, by its 
construct characteristics, be considered as a test of strategies which 
may improve the prediction of other achievement test results. Popham 


and Husek (1969) point out that most of the statistical procedures used 


in current practice may be inappropriate for criterion-referenced tests. 


Studies Related to Wrong Answers 

At this point the concept of answering patterns becomes 
critical» An answering patter will, for présent purposes, be defined 
as some characteristic among the answers selected by a group of 
students which is consistent and stable under statistical analysis, and 
hence leads to an improvement in the validity and reliability estimates 
of the test to which these answers are given. The works already quoted 
suggest that there may be such patterns among "success" performance. 
The question can now be raised as to whether or not there may be 
answering patterns among wrong answers as well. 

A possible source of findings concerning wrong answer informa- 
tion is diagnostic testing which has a considerable history. Schonnel 


(1943) discusses the cumulative results of more than twenty years of 
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nesearch.)) | His procedures showed that) the’ nature and location: of 
mistakes can reveal specific problems, i.e. wrong answers can be meaning- 
ful for diagnostic purposes. oe types are usually estab- 
lished in advance, and are restricted to items which reflect only one 
type of error thus allowing the items to be scored as right or Wrongs 
Large numbers of questions are needed for this procedure since as the 
complexity of the problem to be solved increases, the number of tasks 
required © dhearmnoseta |. peee tule errors heweesee expotentially; prob- 
ably explaining the absence of diagnostic tests in "subjective" subjects. 
The Cox and Graham (1966) procedure refines this diagnostic technique. 
In order to develop diagnostic tests of this sort, an interlocking 
pattern of items is usually designed in such a way that specific weak- 
nesses in a particular student's performance’ can be inferred. This pro- 
cedure identifies weaknesses on the basis of relationships between items 
rather than relationships between alternatives within a particular item, 
It becomes evident from the fact that diagnosis can lead to the 
identification of specific error types that the four categories of 
students' responses made by Shuford et al (1965) may be an over- 
simplification. Furthermore, it would seem reasonable that more than 
one error type could be accommodated in one item if a multiple choice 
format were used. In this latter case, there should be evidence of 


answering patterms among wrong answers. 


Answering patterns in foil selection. The evidence supporting 
the possibility that there may be answering patterns in the wrong 


answers as well as the right ones is sparse. Sigel (1963) reported 


with reference to intelligence testing that children tend to "be 
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consistent within themselves in the errors they make Sr 53/2" 

Fouldes and Forbes (1965) reported in the manual for their 
revision of the Advanced Set of Raven's Progressive Matrices the 
following finding concerning common errors: 

Four types of common errors could be identified. 

(A) Incomplete solutions. There were errors due to people 

failing to grasp all the variables determining the nature 

of the correct figure required to complete a test item. 

Instead they chose a figure which was right as far as it 

went but was only partly correct... (B) Arbitrary lines 

of reasoning. Here the figure chosen suggests that the 

person has used a principle of reasoning qualitatively 

different from that demanded by the problem... (C) 

Overdetermined choices. These were errors involving 

failure to discriminate irrelevant qualities in the figure 

chosen... (D) Repetitions. These are errors made by 

people who simply selected a figure identical with one of 

the three figures in the matrix immediately adjacent to the 

space to be filled. [p. 20/ 

Fouldes and Forbes (1965) did not attempt to show whether or 
not these common errors were more characteristic of some individuals 
than of others. 

These types of error would seem to be more related to some form 
of answering procedure based on the relational characteristics of the 
alternatives rather than on their informational characteristics. 

Powell (1968) factor analysed some wrong answers derived from 
an administration of Gorham's Proverbs Test (1956). This test would 
probably be classified as a comprehension test by Bloom's Taxonomy. A 
wrong answer patter of four factors resulted. These were: 

bed (Reductaon aot pimtormationstortieck aysimplifiicaticn of the 

statement 

2. Addition of irrelevant information 


3. Substitution of elements 


4, Replacement of proverb by one largely unrelated 
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If this list is compared with the one by Fouldes and Forbes on 
page 16, we find, at least by description, Factor I remarkably like 
their Class C (Overdetermined choices). Possible relationships between 
the remainder are less certain although Factor 4 and Class B may be 
nelatedieclheim Class: Deis*unlike Wactorr3 but is every ch: like the 
"Word-Word Links" class present in the experimental test used in this 
study. A definition of this class is on page 38. 

However, Sigel (1963) went on to report that there "seemed to 
be no relationship between type of error and total score." ES 53 / In 
contradiction to Sigel, Jacobs and Vandeventer (1968) showed that 
within the context of Raven's Coloured Progressive Matrices and by 
using the Guttman and Schlesinger (1967) facet design, that a relation- 
ship often does exist between right and wrong answers. Ebel (1969) has 
shown similar systematic characteristics among True-False items. 
Furthermore, Powell and Isbister (1969) showed that some wrong answers 
can be related to right answers so as to adversely affect the high 
scoring students. The type of foil involved was the "irrelevancy." A 
definition of this class is on page 37. An inconclusive trend in this 
same direction was found in Factor 4, page 17. 

Thus, the available evidence, scanty though it is, suggests 
that neither ‘misinformation nor "no information" (leading to a hap- 
hazard answer) are sufficient to account for all the wrong answers 
given to multiple choice achievement tests, and that in certain 
sila eerpient specific foils may influence the total-correct score. 

If wrong answers contain achievement information, then the 
wrong answers which display systematic characteristics which acceptably 


support the construct characteristics of the experimental test should 
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improve the prediction of independent achievement scores for the same 
examinees. 

This improvement should occur in comparison with the prediction 
made by either the total-correct scores on the experimental test or 
some reasonable subdivision of these scores into subtest scores where 


the subtests also fit the construct characteristics of the test. 


Studies Related to Item Generation - 
Perhaps the most ambitious attempt to develop tests reflecting 
Bloom's Taxonomy (1956) was the work of Kropp, Stoker, and Bashaw 
(1966). These researchers encountered a number of problems in their 
work some of which are discussed here along with the alternative 
procedures used in the design of the experimental test used in this 
study. 
The problems they encountered which are relevant to this study 
are: 
1. Problems arising from the "Knowledge" category of the 
Taxonomy 
2. The generation of Synthesis and Evaluation category items 
in multiple choice format 
3. Item analysis problems 
4, Problems arising from implicit assumptions in their study 
Although the fourth of these problems is probably the most 


important for present purposes, the discussion which follows considers 


each problem in the order given here. 


Problems arising from the "Knowledge" category of the Taxonomy. 





Kropp et al (1966) spent some time discussing whether the "Knowledge" 
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category in Bloom's Taxonomy is a legitimate category and, if so, what 
psychological processes other than recall this category might represent. 
They further compound the problem by basing their questions on reading 
selections supplied in the test. Thus, the legitimate question is 
raised as to the meaning of less than a perfect score on "Knowledge" 
items when all the information necessary for the answering of these 
items is contained in the reading selection used. 

These researchers do not comment on the possibility that 
"Knowledge" items presented in an "open book" format may not be 
"Knowledge" questions in the sense of Bloom's Taxonomy at all. Instead, 
these questions, in order not to be obvious, produce 2 test of search 
skills more commonly known in the literature on reading skills as 
"reading for details" ie Gray, 1960, 0. Le iP Misanotl susprasim=s, 
therefore, that an Senor ane contributor to the "Knowledge" category in 
two of the grade levels is an unidentified factor consisting in grade 
nine of "Word Arrangements, Letter Sets, and Symbol Production Ts. Lay? 
and in grade twelve of "Thing Categories, Locations, ea Gestalt 
Transformations" Vas 1gu/e: All three of these tests were eyes 
loaded on the unidentified factor for grade nine, and the "Locations" 
test is positively loaded on the unidentified factor for grade twelve. 
These unidentified factors add credence to the suggestion that the 
"Knowledge" category for the Kropp et al (1966) tests may well be more 


related to search-skills than to recall. There is probably no logical 


method of testing "Knowledge" as defined by Bloom's Taxonomy when the 


ral 


3 these names refer to names of specific tests from the Kit of 
Reference Tests (French, Ekstrom, and Price, 1963) which 
purport to define particular cognitive aptitudes. 
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20 
information background is supplied by the test. In the case of the 
experimental test in this study, no "Knowledge" category items were 


generated. 


Another point made by Kropp et al (1966) was the difficulty of 
generating multiple choice items of the Synthesis and Evaluation 
Categories. One of the problems encountered in this respect is the 
restriction of a specific category in Bloom's Taxonomy for inductive 
reasoning to one subcategory of the Synthesis Category. Another sub- 
category adds "unique communication" requirements which are impossible 
to lates a multiple choice format. The third subcategory involves 
producing a plan or proposed set of operations. , Again, the open- 
endedness of this requirement restricts its employment in the multiple 
ehoice format. 

Second, if the "Synthesis" category ts restricted to induction, 
the problem remains that the internal structure of a single reading 
selection is usually highly organized. For this reason, the generation 
of a large number of items which require an inductive combination from 
some components of this selection or an inductive generalization from 
these components is very difficult because both of these possibilities 
are either explicit or closely implicit in the passage. However, if 
More than one reading selection is included in a tést of this type, it 
would seem to be a relatively simple matter to generate items which 
require inductive combinations between selections or inductive 
generalizations between selections. This latter procedure is used in 


the experimental test in this study. 
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It is possible that the nature of the strategies employed by 
examinees when solving problems has an effect on the effective classifi- 
cation of the item by the Taxonomy. Two outcomes would be expected in 
this case. First, the more familiar an examinee is with the content of 
the problem the lower the effective classification of that problem. Sec- 
ond, the nature of the strategy shifts employed for generating foils may 
influence the strategies which the examinee has to employ to answer the 
problem, which in tuxrm may also atiect the classification of the problem. 
For instance, an item classified as synthetic on the basis of the stem 
alone or the stem and right_answer may become a comprehension item if the 
foils stress reading comprehension, Perhaps the rather surprising 
apparent dislocations of the Evaluation items in the Kropp et al (1966) 
study reflect this problem. It should be noted that the Evaluation 
Category occurs in the second, third, and sixth positions in the ordered 
Simplex (Guttman, 1954) analysis and additionally in the fifth position 
by mean score (oa Kropp et al, Raise Of 88/. Greater elaboration of 
this latter problem occurs when the implicit assumptions of the Kropp 
et al (1966) study are being discussed. Only three Evaluation items are 


used in the experimental test because of its length (30 items). 


Problems of item analysis on taxonomic tests. Another problem 
which Kropp et al (1966) discuss at some length is the problem of item 
analysis for tests designed to measure levels in a Taxonomy. The 
Taxonomy was developed on the basis of the assumption that each higher 
level subsumes all lower levels and adds some unique characteristics of 


its ow. Thus, as the level of the Taxonomy increases so does the 


complexity of the problems which are appropriate to this level. It 
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would therefore be expected that the difficulty of items designed for 
each category would increase as the level of the Taxonomy for which 
these items were designed increased. Thus, the selection of items on 
the basis of approximate middle difficulty at each level of the Taxonomy 
in order to maximize discrimination would seem to be inappropriate. 
This subsumption property also implies that if any item were missed at 
any level of the Taxonomy, all items designed for higher levels of the 
Taxonomy which involve the context of the item missed should also be 
missed. -As a result, the number of items correct at any level of the 
Taxonomy should determine the upper limit of the possible score for the 
next higher level. 

Kropp et al did not test this latter hypothesis in their study 
by examining individual performance to see whether or not individuals 
who answered a particular Knowledge question incorrectly tend, in 
general, to miss all higher level. questions related to the same informa- 
tion background. “ They did, however, mention that low scorers on the 
Knowledge subtest tended to be low scorers on all subtests. An alter- 
native hypothesis which might be posed is whether or not those people 
who misinterpreted a particular knowledge question are more likely to 
miss a high level item from the same background if one of the foils 
contained the same misinterpretation than if it did not. Although this 
latter alternative presents an hypothesis which is beyond the scope of 
this present study, it is more in keeping with the possibility of the 
influence of systematic choice behavior on response selection as 
developed here, than is the former hypothesis. 

It is true that for dichotomous variables, the discrimination 


is maximized for items of middle difficulty. In general, if all 
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alternatives are to be considered, discrimination is maximized if the 
selection frequency for all response alternatives on any item is equal. 
Thus, for a four-altermative item, discrimination is maximized when the 
difricnityv is «<5 when all four categories are used. In the case of 
forcing a dichotomy on a polychotomous variable, the fact that 50 per 
cent of the examinees eh the item right means that the distribution of 
answers on this item is not the product of chance, at least for the 
right answers. The same conclusions may be true for wrong answers, as 
Powell (1968) has shown, when higher mental processes are involved. 

For these reasons, it may be reasonable to ignore item difficulty 
except 20% Very easy or very ditficult items as a criterion oF item 
Peat on! At least the former argument with respect to ascending 
complexity, and the related ethical problem of predetermination of 
hypothetical results were the basis for Kropp et al (1966) ignoring 
item difficulty in the preparation of their tests. he latter argument 
with respect to the discriminative power of polychotomous items, except 
in extreme cases, was the basis for minimizing the importance attributed 
to item difficulty in the present study. 

As Kropp.et al (1966) point out /p. 77/, an additional problem 
with respect to item analysis arises in the interpretation of correla- 
tions on data derived from taxonomic tests, Since the subtests are 
assumed to be hierarchically interdependent, the bivariate distributions 
of scores between subtests appear triangular, making the distribution of 
each higher level skewed further to the right. On this basis the 
total-correct score may not be normally distributed for tests of 
practical length used on groups of gisualasage, hence the use of biserial 


correlation for the validation of an item against the total test score 
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maybe inappropriate. eThis fact, as they point out, also raises 
problems in the interpretation of any correlation coefficient in their 


study. When determining the discrimination coefficient Kropp et al 





(1966) used the traditional procedure. 


Problems which arise from implicit assumptions in the Kropp, 
stoker, and Bashaw study. The eet assumption of the Taxonomy is 
that each higher level subsumes all lower levels and adds characteris- 
tics of its own. For this reason, Kropp et al (1966) approached their 
analysis with the implicit assumption that the complexity dimension was 


characteristic of the Taxonomy as a whole rather than being a 


——— re 


eae 


characteristic of each level of subcategory within the Taxonomy. The 
results of their findings with respect to this assumption were incon- 
clusive. Analysis of the subtest scores showed that the order of the 
deveis of the Taxonomy as a hierarchy did not fall consistently into 
the order hypothesized. On the other hand, Powell and Isbister (1969) 
tested the assumption that hierarchical categories should be obliquely 
related. Their finding, however, was that the use of a promax rotation 
did not improve the resolution of the factors when right and wrong 
answers were combined, thus the expected obliqueness did not occur. 

It has already be indicated on page 21 that the Evaluation 
category can occupy most positions above the Knowledge level. The 
Kropp et al (1966) study also found that on the basis of cognitive 
attributes no single category is consistently defined for all grade 
levels tested although the tasks themselves were identical for all 
grade levels. These two findings of Kropp et al are inconsistent with 
the Taxonomy as defined. Perhaps the Taxonomy is actually a descrip- 


tion of some of the strategies employed by humans in problem-solving 
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situations. There may be a hierarchial order to these strategies but 
they may not be taxonomic in Bloom's sense of the term. | 

A problem which is a ee ies Gt problem for a five-year 

old, may be a comprehension level problem for a.twelve-year old. In 
this context two deviations from this taxonomy would be expected with 
Bloom's Taxonomy. First, each category of the Taxonomy, with the 
possible exception of "Knowledge," should be characterized by a range of 
complexity levels within the category in addition to an order of 
complexity levels between categories. In such circumstances, as Kropp 
et al (1966) demonstrate, a wide range of possible orders may occur 
among specific samples from category levels. In addition to the most 
ome and expected, order of the categories found in their study, the 
categories occur at least once in any one of three other orders under 
Simplex analysis. 

Second, the strategies involved at different developmental 
stages will-vary in accordance with .the information and strategy back- 
grounds of the individuals at these stages. For this reason, striking 
dissimilarities in the cognitive attribute ana on the basis of the 
Magoo: Rercrence Tests (French, Ikstrom, and Price, 1963) for any 
category Poni be expected at different developmental levels. This is 
precisely what Kropp et al found. An important characteristic of this 
change should be its movement toward simplicity. For instance, if we 
reclassify the Kropp et al (1966) "Knowledge" category as a "search" 
category on the basis of the Undefined factor, we find that for grade 


nine the positively correlated cognitive aptitudes are Word Arrangement, 


Letter Sets, Symbol Production which suggests that the grade nines may 


be generating their search strategies as they proceed with the test. 
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For the grade twelves the positive attribute is Locations which suggests 
a more simple and direct approach. 

Another factor which Kropp et al (1966) discussed is that the 
difficulty of a problem may be affected by the complexity of the problem. 
It also may be affected by the familiarity or obscurity of the informa- 
tion background and/or strategies required by the problem solver. It 
may also be affected by the nature and the fineness of the discrimina- 
tions which the solution to the problem requires. This latter aspect 


may be related to the nature of the foils. Kropp et al (1966) deal only 


briefly with the difficulty problem. fet pp. 90 and 159/. 


Contributions of the Kropp, Stoker, and Bashaw Study to the 
present study. It may be possible to assume that Bloom's Taxonomy is 
not a subsumptive taxonomy. In this case, the Kropp et al (1966) study 
more strongly supports the possibility of the transcendence of process 
over content than their interpretation of their findings suggests. This 
transcendence of process over content has also been supported by Furth 
(1966) in his work with the congenitally deaf. 

In combination with the other research already discussed 
(see p. 16) there would seem to be at least three variables which 
contribute to the choice behavior of examinees on multiple choice 
achievement tests. These are: 1) content, 2) process, and 3) effective 
complexity. A fourth possible variable is item difficulty (see: p. 26). 
Since misinterpretation of content and inappropriate selection of 
strategies might both be expected to lead to the selection of an inappro- 
priate response, it is reasonable to assume that at least some students 
will display systematic wrong-answer selection. Hence, in a forced- 


choice situation the nature of the alternative choice provided would 
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be expected to influence the nature of the selections made. If foils 
are deliberately designed to reflect probable misinterpretations of 
content, or probable inappropriate gtiaey, of strategy, more than 
the "right" answers might be used to determine the present achievement 
status of the examinee. 

How can tests which meet these criteria be developed? It is 
fairly clearheromethe Kroppectied, (1966) study that the use of Bloom's 


Taxonomy is useful as a set of guidelines for the construction of the 





relationship between the stem and the right answer for each item. A 
discussion of common recommendations for the development of the foils 


for each item is presented in the following section. 


Recommendations for Construction of Foils 

The following discussion reviews what some textbook authors 
have had to say to teachers about the construction of foils for multiple 
choice items. Among these authors, Ross and Stanley (1954) list 
fourteen rules for the construction of multiple choice items. Of these 
only two deal specifically with foil (distractor) Construction. 

6. Make all responses plausible 
9. To measure higher levels of understanding, increase the 
homogeneity of the options provided Ibe 185/. 

These eaters do not detine plausibility, and the example they 
use for increasing the homogeneity of options actually illustrates 
increasing the content specificity of the item. Their second suggestion 
involves increasing the fineness of discrimination between alternatives 


which may be more related to the difficulty of the item than to “higher 


mental processes." 
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As another example, Thorndike and Hagen (1961) in their second 
edition list ten "maxims for multiple choice ie a Four of these 
have direct bearing on foil construction. Quoting the original we 
find (Italics in original): 


4, Be sure that There is One and Only One Correct or Clearly 


eee eC 


Best Answer. 
5. Beware of Clang Associations. 
8. Beware of the Use of One Pair of Opposites as Options If 


7. \jpeware of the Use sot None of These,'' "None of the Above," 
and "All of the Above" as Options./pp. Vs Yow (Os enoop. ro, 


Whether or not there shouid be more than one "correct" answer 
will depend upon whether or not the examiner wishes to discriminate 
Nearer levels OL insight into a particular problem as in the “best 
answer" type of test. However, to make such discriminations may require 
the use of information from more than one alternative of any item. 

One of Hoffmann's (1962) most damning criticisms of the multiple 
choice types of tests arises from the arbitrary assignment of only one 
alternative of the response set to the "right" category in tests of this 
type in such a way as to discriminate against the thoughtful, well 
informed student. This admonition is only appropriate if we are to 
assume that the only answer to be taken into account for any particular 
item is the one designated as right" whereas the "rightness" may be 
arranged on a continuum in the “best answer" type of test. 

Thorndike and Hagen (1961) quite rightly point out that Clang 
Associations (see number 5, p. 28) between stem and right answer tend 
to give the answer away. However, using superficial associations be- 
tween the stem and the wrong answers may in some circumstances be an 


effective discriminating device (see? "Word-Word Link," p. 38). 
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Thorndike and Hagen's (1961 ) alternatives, numbered-eight and 
| nine (see p. 28) are interesting in that they suggest that certain 
aspects of the logical relationships between answers and foils should 
Pe considered in folly conetruction. Ii a student selects an answer 
belonging to a set described in number five (see De 28) that is 
logically opposite to the "right answer," then this selection in itself 
may contain useful information. Such a selection reveals at the very 
least which students completely misunderstand the relationship in 
question. Why this sort of alternative should be discarded without 
qualifications is therefore not clear. The criticism these authors 
make of the "all of these," "none of these" type of alternatives have a 
Similar basis. They neglect to say that if "none of haben phi stem 
it may be regarded as being logically equivalent to an omission. The 
"none of these" provides a noncommittal response which has the effect 
of making closed-choice alternatives into open-ended alternatives. For 
some purposes it may be useful gen if the student made one of the 
less common errors, if there are more possible errors than the foils 
account tor. © in addition, omissions at the end of the paper can also 
mean 'not finished." Since there is more than one possible reason Poe 
omitting an item, interpretations of an omitted response becomes ambig- 
uous. For these reasons, the basis upon which a student makes a non- 
committal response may be a valid question for study. 

| More recently, Ebel (1965) lists 48 "suggestions for preparing 
good maltiple choice test items." Of these 48 only five directly 
Pela rewOnLOLl sr omy diet racvors «i aie also rates these as "desirable" 
or “undesirable.” Quoting the original: 


32. Item using true statements as distractors. (Desirable ) 
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33. Item using stereot in distractors. (Desirable) 





34. Item using obscure distractors. (Undesirable) 
35. Item using a highly implausible distractor. (Undesirable ) 


36. Item involving verbal trick. [op 183-185/ (Undesirable) 

The first two of these are examples of the use of errors in 
logic which Sanders /1966, 0. 1047 suggests we teach the students to 
recognize, but does ce elaborate on, with respect to measurement. 

kbel's (1965) suggestion numbered 34 immediately above, proposes 
tnat the use of obscure BERR i vocabulary is undesirable. On the 
contrary, if the intention of the examiner is to study responses to 
obscure, ambiguous or complex situations, this type of item may be 
desirable. Although other methods may have certain advantages wnen 
measuring complex human behavior, the multiple choice method retains two 
Particular advantages. First, a high level of control can be maintained 
in the alternatives supplied so that the "controlled sample of perform- 
ance" characteristic of all tests cam be very explicit. Second, once 
the performance components of the complex behavior which is to be 
observed has been established, accurate comnae of the frequency of the 
choices which fit the categories of alternative (whether right or wrong ) 
designed to measure ee components is a simple matter. Other meas- 
uring instruments have other advantages at the expense of these two, 
The study of such items would probably necessitate examining all 
responses to each item. Thus, Ebel's (1965) proposal that this type of 
item is undesirable can be considered valid only if the "one right 
answer" assumption is considered valid. Closer scrutiny of this entire 
problem seems reasonable. 

In suggestion numbered 35, (De S100) relating to implausibility 


the problem of a definition for plausibility arises once again. 
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Plausibility may bea Tunction of the rationale used in determining the 
construct and content validity of the test. The examiner must be able 
to anticipate what alternatives may be plausible to the examinees. 
Without a definition of plausibility, implausibility is impossible to 
determine. In fact, plausibility is often defined on a post hoc basis 
from the item analysis with foils having a low selection ratio being 
classified as "implausible." However, if the purpose of discrimination 
TomvoLLocnoiny indaviduels tor aitierential treatment a foil which 
identifies ten or twelve out of 1,000 students may be more valuable than 
one which identifies 250 students. 

Finally, many foils which seem to involve a "verbal trick" may 
have a valid function. These verbal tricks are probably ee etiee kinds. 
Theviirst, tipo could be the antrodcuction of a peculiarity of wording 
designed to produce interpretive or misreading errors on the part of 
some students. The second kimd of “verbal trick” is Teund in such 
things as Zeno's Paradoxes (c 340-264 B. C.) in which the "verbal 
tricks" involve a faulty assumption in the reasoning. The third kind of 
"Verbal trick" introduces the possibility of detecting in the examinee 
an inappropriate "set" for the correct solution of the problem. Both 
the Binstellung effect and "functional fixity" may possibly be used to 
develop examples of foils for this type. In each of these cases it is 
conceivable that the information generated from response to these types 
of item could have discriminative value. The issue here, once again, 
is both ean tent and construct validity. Does the "verbal trick" give 
the intended information, or interfere with the obtaining of this 
information. 


We find this same ambiguity of advice prevailing throughout the 
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range of standard texts in this area. From the ETS booklet Multiple 
Choice Questions: A close look (1963) through to such writers as 
Ahmann and Glock (1963), Gronlund (1965) and Noll (1965) we find the 
great bulk of the suggestions about item writing discussing the func- 
tional, linguistic, and structural characteristics of the stem, and 


stem-right answer relationships, with only minimal and often contra- 


dictory treatment of the foils and how to construct them. 


Qn the basis of the above discussion we can identify several 
general bases for foil construction as presented to constructors of 
multiple choice tests. These are: 

mM. thogicalekelationships 

Zoe DOowCarenrrens 

©. @eboci ial me ds 

4, Misinformation 

5. Obscure Relationships 

6. Misunderstanding 

fo aVexrbal eLiricks 

Not all of these are regarded favourably by the authors 
mentioned nor are these bases consistent with themselves or between one 
author and another. It would seem that the recommendations have been 
developed on a trial-and-error basis derived from the experience of 
professional test constructors during their attempts to meet the 


statistical criteria of an “effective item.” 


The Possible Value of the Experimental Test 


—_— ss 


The testing technique being explored in this study hypothesizes 





oo 
that Bloom's Taxonomy adequately describes the strategies leading to 
- right answers, and that a set of logically based guidelines for foil 
development effectively describes some of the possible systematic 
deviations from the ideal outcomes of these strategies. These two 
facets combine to form the construct characteristics of the testing 
techniques under study. Of course, any findings from a purely explor- 
atory study must be tentative. However, wrong answers from a "strategy" 
test may increase the predictive power of that test for total achieve- 
ment scores (found in the usual way) from independent achievement tests. 
In this case, more than the information background of a test may be 
involved in "success" on an achievement test. Such findings would 
strengthen the support for the hypothesis that process may transcend 
content. Furthermore, this study may suggest some of the typical types 
of errors students may make as they mature intellectually which might 
eventually lead to the establishment of a behavioral description of 
development which is independent of test content, and of educational 
strategies which may be appropriate to the stages and phases of this 
developmental sequence. 

On the application side, the main advantages of guidelines for 

foil construction would be expected to involve the 1) simplification 
of item writing, zy Clariticavion Of way i toil as wrong, arid 3) 


possibility of producing diagnostic tests in subjective content areas. 
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CHAPTER IIT 
DESIGN OF THE EXPERIMENTAL TEST 


From the discussion developed in Chapter II, the usefulness of 
Bloom's Taxonomy for the development of process-oriented items was 
suggested. The possibility that Bloom's Taxonomy may not display the 
assumed subsumptive characteristic between categories does not minimize 
its role relative to the establishment of the construct validity of a 
test. The evidence presented suggested that there is no similar set of 
internally-consistent construct guidelines for the development of foils. 
seven general categories of foil based upon recommendations from the 
literature could be established. Using these seven as a starting point 
the first task in this chapter is to develop a systematic set of Guide- 
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lines which may prove helpful Lor Old, constmne 
egories can be further reduced. Perhaps the most important category 
involving strategies are those foils which can be based on probable 
errors in logic made by the examinee. Partial information can lead to 
an error in logic if the wrong strategy is used to generate the missing 
inf OFMS FiCMe kt can Lead, Giso, along with other casues, to an over- 
simplification of the problem. .Since only the product of the choice- 
behavior is observable on a multiple choice test, it would be reasonable 
to include Logical Errors, Partial Information, some Verbal Tricks, and 
perhaps some Logical Relationships (like answers which are opposite to 
the teas ones) in a list of categories of foil generation where higher 
mental processes are to be tested. 

Misinformation and misunderstanding may be identical or they may 


be different in that the misunderstanding may be related to the reading 
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of a specific item or group of items rather than to a weakness in the 
information background of the examinee. If the examinee succumbs to 
certain kinds of verbal tricks (for example, the use of meaningless 
jargon in a foil) his problem may be more immediately test-related than 
background-related provided that he is not misled elsewhere on the test 
when jargon is not used. In the present context there is the possi- 
bility that in addition to the process variables there may be a class 
of foil related to, the! linguistic characteristics! of the item. This 
elass of foil)mayoberdesignated as a. "Misreadinge' class. 

Another possibility is that the examinee has systematically 
Misclassifiedva parseeular piece or set ofeiniormatizon. In items based 
on the possible logical relationships among the total information back- 
ground this piece of misinformation will lead to the systematic 
selection of specific wrong answers each time this misclassification 
appears in a foil. For instance, the person who confuses the work of 
Hebb with the work of Hull. Foils of this type, and of several others, 
are beyond the scope of the present study and are, therefore, classified 
in the "Others" class. Subsequent research may be expected to elaborate 


this latter problem. 


The Guidelines for Foil Construction 
Our earlier discussion showed that similarities can be found 

between common errors on nonverbal tests and verbal tests (see: p. 17). 

The present experimental test in its original version was based on 

Be ey ee et involving logical fallacies and logical relations. The 

test has been revised in an attempt to improve item discrimination for 


the present study. The same reading selections and general questions 


and overall format remained unchanged. The Guidelines which follow 
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36 
were used to revise the foils. —. 

Since the definitions of foil classes, as they were originally 
used, tended to lack precision, they Psee redeveloped for this study. 
The Guidelines are described below ( see p. 37) in terms of the proce- 
dure used for constructing each type of foil. Four classes were 
produced: 

1. Strategy class; the largest group of Guidelines to be 
developed for this study is based on the logical 
characteristics of the foil relative to the right answer 
and information background. Because these types of foil 
are suggestive of incorrect analytic procedures they — 
collectively referred to as the "strategy class." 

eo Mipreading Class; this group of toils is based on semantic 
characteristics of the foils relative to the right answer 
and information background. The nature of this test, 
1.€. 4n open book test, would be expected to reduce the 
possible number of foil categories in the misreading class 
because an examinee who feels he has misread an item can 
refer directly back to the information background supplied. 
This class 01 foal probably has many je members which 
would describe different aspects of misreading where 
information recall is the source of information. An 
example of this situation is the Jargon (J) category 
(see p. 39). 

3. Other; the foils in this class are unclassifiable, at least 
by the present Guidelines. Future studies are expected to 


reduce. but not eliminate this class of foils. 
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67 
examination, (i.e. an open book examination) precludes the 
development of misinformation foils which would be expected 
tO Occur an the context of a test requiring information 
recall. These would be expected to be related to"Knowledge" 
level items, a level of item which was not used in this 


examination for reasons already discussed (see on 19ft), 


The first two of these major classes may be subdivided on the 


bases Of a Speciitic description of howe foi] which fits any particular 


category is produced. This subdivision follows: 


Guidelines 


Me otrategy Glass 


Hie 


Overgeneralization. (OG) In the development of this type of 
foil the author retains the correct relationship of the 


best answer in its entirety and adds some irrelevant 


jas 


information. (For example, see item 1A, p. 154). 


Qversimpliticatien. (08) In this case the author omits one 
or more parts of the best answer. (For eee see 

wen 20a 1 56) | 

Inversion (Inv). In this case the author makes a statement 
in some way opposite to the best answer. (For example, 
sécaten 3C, ps 158). 

Jere levancy (Irr). In this case the author makes a true 
statement which is unrelated to the best answer, or a 


statement which could be a correct answer. (Perhaps by 


virtue of some restriction in the stem). (For example, 


ae _ 


Labasarrsges sit ty oman ir yeahs oe tye eat 
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gait sabatnoeg, (ys. 3 pimp pad, ambien: ys > 64), 
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by 


see item 1D, p. 154). 


Invalid Assumption (IA). In this case the author begins 


with an unwarranted assumption about the background or 
solution to the problem and thus writes a foil which would 
be correct as if this assumption were valid. (For example, 
see item 1C, p. 154), 

Substitution (Sub). In this case the author replaces at 
least one of the elements or the relationships of the best 
answer by @ corresponding element which is less acceptable. 
(For example, see item 2B, p. 156). 

Transposition (Tr). In this case the author modifies the 
order of the elements in an ordinally dependent relationship. 
(For example, see item 30C, p. 181). 

Common Misconception (CM). In this case the author utilizes 
his knowledge of the probable common misconceptions held by 
the examinees to write the foil. (For example, see item 5B, 


p. 159). 


B.  Misreading Class 


ie 


L 


Word-Word Link (WW). In this case the author produces a 


false statement which has strong verbal links with the stem 

or background information by either repetition or. associa- 
tion. This type of foil may be similar to Foulde's (1965) 
Class D error, see p. 16. (For example, see item 7B, p. 162). 


Redefining of Terms (RT). In this case the author uses a 


This type of foil misleads certain of the best students, 


perhaps the more imaginative ones (see p. 18). 
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word or words in the foil in different literal or 
eonnotative sense than it is used in the stem or background 
information. (For example, see Heme vip. bo). 

3. Jargon (J). In this case the author produces a quasi- 
meaningful statement which tenuously relates in some manner 
to the best answer. The use of coined "near words" may also 


be present. (Not used in experimental test; see p. 36). 


C. Others 
1; "Others" (0); In this case the foils, at present, for some 
reason, unclassifiable. 
These are the Guidelines which were used in the construction of 


the foils in the experimental test. 


ey 


As already mentioned, Bloom's Taxonomy was used as a guide for 
the construction of the stem and right answers of the experimental test. 
An interrater reliability between judges for the advance classification 
of right answers was reasonably high (r = .83). The Guidelines just 
given on pages 37-39 were used to construct the foils. The interrater 
reliability for foil classification was somewhat lower (pie 4b2, Nie els 

As in the case of Bloom's Taxonomy for the right answers, the 
Guidelines presented the immediately evident advantage of increasing the 
number of possible foils which could be considered for any one item, 
making foils easier to generate than they were in the more usual "hit- 
and-miss" method. An additional advantage for the Guidelines became 


evident after the earlier administration of the test. The Guidelines 
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help clarify the basis for why any foil should be considered wrong. The 
absence of such a basis is a common weakness of teacher-made tests. | 

The examination consisted of five short reading selections drawn 
from material which was in some way related to educational psychology 
since this was the central topic of the course in which this examination 
was to be used. They were also chosen on the basis that it was rel- 
atively unlikely for the examinees to have encountered the works from 
ih these selections were drawn in their previous training. To the 
extent that these selections were specifically oriented to the vocab- 
ulary of the studies of psychology and education, this test demanded 
information recall from the examinees. Aside from this restriction, it 
was assumed that all items could be answered correctly oder upon the 
basis of the information given in these selections. This assumption 
may not have been entirely Sen Racha 

since most of the necessary background information was assumed 
to have been supplied in the test, no Knowledge category items were 
generated. On this basis, the test was intended to be a "higher mental 
processes" test. Since the major emphasis of the test involved logical 
analysis, it was assumed that the test was essentially an "Analysis" | 
level test. The findings of the preliminary version as reported in 


Powell and Isbister (1969) confizm this assumption. 


Content and Construct Characteristics of the Experimental Test 

A detailed item-by-item discussion of the test may be found in 
Appendix B (see: p. 153 ff), In briet,stive reading sSakehoniawenale related 
to the area of educational psychology were chosen on the basis of 


information density and the unlikelihood of the examinees having 


encountered the selections previously. These selections which are 
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both given and referenced in Appendix B are referred to subsequently as: 

ve Sotpldity 

2o Awareness 

Be Ageression 

4, Discipline 

5 Progress 

The 30 items in the test were classified using Bloom's Taxonomy 
as a construct model as indicated in Table 1 (p. 42) and elaborated in 
the discussions in Appendix B. No Knowledge-level items were developed. 
Items were classified as Synthesis if they required the examinee to 
organize the material from more than one reading selection into some 
systematic relationship when deciding which altemative to selves Tor an 
answer. A reasonably high interrater reliability (r = .83) was found for 
the classification of the items based upon the-item format, the stem, and 
the stem-right relationship when the right answer was indicated. Dis- 
agreement occurred among several oes and among other reviewers of 
this study on the keying of some of the items, This disagreement would 
be expected from the subjective nature of the content and the related 
differences among the value systems inherent in any oe 

Much less agreement among raters was found for the foil 


classification (r = lees This poor result was expected for the 


5 


A word of caution is in order. This low interrater reliability 
suggests that readers following the item-by-item discussion in 
Appendix B may disagree with the foil classification given and 
with the reasoning behind it. It would be interesting for the 
reader to record his disagreements and to compare these with the 
results of the cluster analysis as given in Table II, page 73; 
and Table 12, page 7/7/. 
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CLASSIFICATION OF ITEMS 
USING BLOOM'S TAXONOMY 
(30 ITEMS) 





Bloom's Category 


Comprehension Application Analysis synthesis Evaluation 
Item 
Numbers BO, 12515 sry nal Py rome’, 13,14 10,126,345 
PAP ONE OF 28) 26,28 


BOERS 21 O85 P90) 129330 


Totals 4 53 14 6 3 





43 
reasons already given (see: pp. 4-5), and the findings of Kropp et al 
(1966) which suggest multiple interpretations of specific items and 
alternatives within heterogeneous STOUDS » This multiplicity would be 
expected to increase with the complexity and subjectivity of the. content 
so that a high level of agreement, even among professionals on the 
particular test used in this study, would be unlikely. 

To illustrate the extent of this problem, a check was conducted. 
One of the raters of the items disagreed with the classification of 
three foils in particular. Of these three only one of his reclassifi- 
cations was supported by the cluster analysis as given in the results of 
the study (see: pp. 182 for details). This one-in-three success ratio 
was equivalent to that of the experimenter. | 

The overall appearance of the experimental test suggests that in 
the traditional sense it is a very poor one. The internal consistency 
pe ieee te OS bee Ke ee) ONE Gilde hen Lemedit ficulties 
and biserials from Table 40 of Appendix A (p. 150) is equally discour- 
aging. However, the use of Bloom as a noaee for the right answers and 
the Guidelines for the wrong answers suggests that the test should not 
be considered homogeneous. For this reason, and the realbie given 
earlier when discussing this same problem relevant to the Kropp et al 
(1966) study, (see p. 21 ff) the use of traditional evaluative proce- 
dures on this test may be questionable. Support for this position is 
found in the Procrustes rotation of the factors to fit the clusters 
which gives six nearly orthogonal factors which display quite adequate 
internal consistency (see: p. 70). 

The foil classification procedure differed from the item 


procedure in two important respects. First, although the Guidelines 
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Llp 
were also used as a model for the possible information content of the 
foils, the relationship between this model and any possible characteris- 
tics of the examinees, in the almost total absence of research, was 
largely unknown, Second, whether these Guidelines formed mutually ex- 
clusive categories, or a hierarchy paralleling Bloom could only be 
inferred from the assumptions which went into their development. 

These foil Guidelines were used to help develop the foils as 
well as to classify them. A summary of the classification of the foils 


is fi yen in Table 2, 


TABLE 2 


CLASSIFICATION OF WRONG ANSWERS 
USING THE FOIL GUIDELINES 


OT or SS Se in ame rn Sp ES ST, SES 
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8C 7D LOG 4D) 13A 7A LZ ESS) 30D items 
t2C ie 1A 6D. 25) ehe 13C 30A 19-24 
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The test used in this study is a revision of the one reported 
in Powell and Isbister (1969) which had a slightly different purpose. 
The present discussion supported by Reem. analysis given in 
Appendix B would seem to demonstrate that, for all the faults of the 
instrument, the content and construct requirements for this test as 
-laid out in Chapter II have been met to a reasonable degree of 
acceptability. 

. In the Powell and Isbister (1969) study the advance 
classification was taken as given and profile scores were developed 
accordingly. The resulting score sets were treated as independent 
Variables and subjected to principal axis factor analysis in order ae) 
determine relationships among these scores. In this study the advance 
classification was not taken as given but subjected to a comparison 
with a cluster analysis based upon the relationships found among each 
of the altermatives..In this present study the acceptability of the 
advance classification system as exemplified in the test was being 
studied. 

On the basis of what has already been said about the problems 
that communications of this type produce, it would be reasonable to 
expect the advance classification systems used in this study would 
not hold up without the qualifications derived from the possibility of 
1) multiple interpretations of the communicating stimulus, 2) multiple 
methods of integrating and relating the stimulus to each individual's 
own experience, and 3) leading to multiple interpretations of the 


responses. 
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CHAPTER. iV 


THE DESIGN OF THE STUDY 


The success of this study is contingent upon three aspects. 
First, the study must stand upon the acceptance of the logic of the 
content and construct validity of the experimental test as given in 
Chapter IIT. 

BeconG. tune COMSsiLiucy Validity must find evidential support in 
the statistical results of the analysis of the examineé performance on 
an administration of the experimental test. This support can be found 
in several ways. First, the advance classification may be found to re- 
appear in the statistical pattems. Second, the content pattem might 
be shown not to be an important contributor to the statistical patterns. 
Third, in the event that the advance classification cannot be supported, 
some reasonable method of modifying the advance classification which does 
not violate the construct assumptions, such as possible multiple 
interpretation of the items, may be found. Fourth, the pattems should 
cross-validate between equivalent independent Srouve., 20th, ae cross— 
Validation fails, a reasonable explanation which fits the data and the 
construct assumptions must be found to explain ois Lailure, 

Third, however much the construct validity is supported, wrong 
answers in some form must also contribute significantly to the prediction 
of achievement scores obtained in the usual manner before they can be 
considered to contain achievement information. 

These three aspects form, in combination, the necessary and 
sufficient conditions needed to demonstrate that the method of test 


construction used in this study can be used to develop tests which 
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4? 
contain useful information about student performance in the answers 
given to the foils. A further restriction to this problem arose. Since 
the study began with categorical data, it should end with categorical 
interpretations in so far as is possible. 

To begin with, however, the answer selections on the 
. experimental test cannot be assumed to have any of the usual continuous 
distributions. The selection pattern can be considered to be cat- 
egorical,since one choice is made for each item, but not dichotomous. 

An expedient method of defining categorical data mathematically 
is to treat categorical membership as "one" (1) and nonmembership. as 
"Zero" (0). A matrix of categorical data should have the following 
properties: 


1. The centroids of normalized clusters from the matrix should 


tend to be either orthogonal or opposite each other. 

a The orthogonal projections of the members of a cluster upon 
its centroid should be near unity. 

3. The orthogonal projections of the members of a cluster upon 
we ncentroids,. of all ovner clusters should either be non- 
existent, or nonsignificant. 

Figure 1 illustrates a typical response matrix which displays 
these properties for the twelve variables included, and may display 
these properties for some reduction of the matrix to less than twelve 
variables. Figure 1, a sample response matrix is on page 48, 

The usual procedure for test analysis is to use the right 
answer portion of Part B and Part C (the total number correct) and 
to treat the wrong answer division of Part B as redundant. 


If all four alternatives of each item are considered the 
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A SAMPLE RESPONSE MATRIX 


statistical problem of linear dependency arises. To illustrate what 

is meant by linear dependency refer to teen above. Notice that in 
Pert A of this figure the sum of each row is always three. Only in the 
case of omitted items will the sum of the answers be less than the 
number of items. 

Since all alternatives are being counted, this total is 
predetermined as being the number of items. If the columns are added 
vertically, the sum of the columns within an item is predetermined at 
the number of examinees. In the case of Figure 1 above, this sum is 5. 
In many statistical procedures, linear dependencies have the effect of 


rendering indeterminate or non-unique solutions. 
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The solution to this problem used in-this study was to partition 
the matrix as indicated in Part B of Figure 1 (see: p. 48). Two 
matrices, one for the right answers and one for the wrong answers, were 
made from the original response matrix. The categorical property was 
retained within each of these two new matrices. This procedure had the 
effect of treating right and wrong answers as though they were 
independent. 

Part C of Figure 1:(see p. 48) shows the row sum (horizontally) 
of the right answer partition of Part» Ba. This sum, which,1e6 sthe, fotal- 
correct score, is the usual approach to the interpretation of test 
results. It is with Part C that the results of the statistical analyses 
of fea B are being compared. | 

There are several possible methods of dealing with categorical 
data. Since this study is concerned with relations among categories 
the most reasonable approach is to began with phi..correlation .co- 
efficients between the category pairs. This procedure produced two 
correlation matrices, one 30 by 30 for the right answers, and one 90 
by 90 for the wrong answers. 

Since the results of these analyses were to be Bek aaa oe 
the original group of examinees were subdivided by randon assignment 
into two groups (Group A and Group Bie The data for both groups were 
subjected to the same statistical treatment although most of the 
interpretive work was done with the results from Group A. 

The result of this latter subdivision was that the analytical 
aspects of this study began with four phi coefficient matrices (one for 
right answers and one for wrong answers for each of Groups A and B). 


These four matrices were the basic data for much of this study. They 
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50 
may be found in Tables 32 to 39 of Appendix A. 


The phi matrices gave relationships among pairs of altematives 
only. To proceed further, it became Wecdsodun to find relationships 
among these relationships. From the original structure of the experi- 
mental test there were two patterns of relationship which could be 
sought. The first was the patter as defined by the advance classifi- 
cation based on Bloom's Taxonomy, the second was the pattem as defined 
by the content (information background) of the items and foils. 

One of the methods of checking the data for these pattems 
which could be used is the Procrustes rotation solution to factor 
analysis. The procedure began with the principal axis factor solution 
and found the best rotation of this solution in a least squares sense 
forea Siven matrix. 

A factor solution was used to remove as much measurement error 


as possible from the further analytic procedures used in this study. 


pan} 


the phi coefficient is extremely sensitive to the marginal proportions, 
particularly when the selection ratios deviate considerably from .50, as 
in this study where four altermatives are being used. Slight changes 
can have a profound effect upon specific coefficients. This effect can 
be reduced by the factoring procedure which takes the relations among 
coefficients into account. 

The principal axis solution was used to get as much variance as 
possible in as few factors as was reasonable. 

A thira approach normalized the principal axis matrix by rows 
and then found the distances between the ends of the resultant vectors 


by the usual distance formula. Clusters were then defined in terms of 


minimizing within cluster distances and maximizing between cluster 
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distances. The mathematical procedure used in this study is given in 
Appendisa MoGsee:s! Table 4ti.ep. 151). 

The advantage of this procedure is that-if a good fit is obtained 
the solution alleviates many of the problems of rotation which are other- 
VESeswl Nhe ren. an. facuer snail yiict.solutions. 

All three of these solutions can lead to results which can be 
interpreted categorically. .iIf.a.good. fit is found with the target matrix 
for advance, classification .i ther by process or by content, then the cat- 
egories of the original Clascim Ute lone wereaslOy Dé ised) sli. oOnmeLne other 
hand, good fits were not found, then the categorical solution of the 
cluster analysis procedure would be studied for possible interpretation 
Che, basws Ole Giiber process.or contents JInewhis latter See the data 
would also have to show that there were no contradictions to the content 
or construct assumptions as given in Chapter II, (p. 12 ff) otherwise 
this study would not meet the necessary and sufficient conditions 
required as outlined at the beginning of the present chapter on page M6, 

An additional advantage of these three procedures is that they 
all, begin, from. a, principal. axis. factor solution, of, a correlat2on, matrix. 
Furthermore, if the same factor solution is used in all three cee 
goodness of fit of the cluster solution can nee be found by the 
Procrustes method. Hence all three of these solutions can be subjected 
to the same criteria. 

Since the object of this study was to support the construct 
esate a ies of the experimental test, the analysis began with an 
attempt to find the best possible cluster solution to this construct 
criteria for the data of Group A. It was decided that the best possible 


solution would involve having clusters defined by the most frequently 
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recurring category as defined by the advance classification. The number 
of these identifying elements was to be as large as possible for each 
solution, i.e. the right answers and the wrong answers. The number of 
factors in the principal axis solution needed for this result was then 
taken as standard for all solutions involving the same kind of data. 
_For instance, six factors gave the best solution for the right answers 
for Group A. Hence, six factors were used for all right answer analyses. 

For cross-validation the identical statistical procedure used 
with Group A was repeated with Group B. Cross-validation was then 
established once again from a best-fit match (in terms of most fre- 
quently recurring members) between the categorical results for Group A 
and Group B. Several procedures were used until a satisfactory match 
was found. Once again, the cross-validation could not violate the 
Construct. Considerations, cutlaned im Chapter [I1,.page 12 @f for this 
study to be successful. 

Finally, the categories which were established as being 
potentially meaningful in the earlier parts of the study were used as a 
basis for rescoring the experimental test. The results of these sub- 
test scores were combined in several ways and their ability to predict 
the total-correct scores of two independent achievement. tests for the 
same subjects was compared with the predictive power of the total- 
correct score. In this latter case it would be necessary to show that 
the use of wrong answers consistently improved prediction over 
total-correct score and combinations of right answers. 

If all these criteria were met, the value of wrong-answers 
part of performance information would be demonstrated. With so many 


criteria to meet, the probability that such conclusive evidence would 
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be found is exceedingly low. On the other hand, trends in the 
directions indicated could be treated as suggestive. The borderlines 
between undemonstrated and suggestive, and suggestive and conclusive, 


are unclear and subject to disagreement. 


Statement of the Problem 

Since this study is exploratory, attempting to demonstrate the 
presence of information in wrong answers and to discover the major 
properties of this information, an elaborate theoretical structure for 
formulating testable hypotheses was considered to be unnecessary. 
Instead, the procedures suggested for the establishment of grounded 
(data-based) theory as outlined by Glaser and Strauss (1967) was used. 

Such theory as is used in this study comes from well established 
principles in psychology, communication theory, and test construction 
theory. Beginning with the S-O-R paradigm commonly used in problem- 
solving studies, it became evident -from communication theory that each 
of the members of this paradigm may best be considered a composite. 
iam e,aly sveci ite Stamus may be etre Oa, Tange O1 Incerpreva— 
tions, If this stimulus requires the. solution of a problem, the 
specific interpretations may be subject to a range of solution strat- 
egies some of which may lead to "correct" and some to “incorrect" 
solutions. In the multiple choice test, the examinee can be expected 
to try to match the altermatives given him in the item within the 
interpretation range and strategy range available to him. In this case, 
the most reasonable assumption would be that most, if not all, responses 
given by an examinee to a multiple choice achievement test would be 
selected on a systematic basis. 


If some characteristicwof particular altermatives in two 



















a ne 
I eM ae eh ws gel ae 


oS ee 
ai j=% eyo ay tS ee Se yy Esng? 4 of 2 ne rieetiong, a aa 4 ‘ 


: ; ce 
oP ine ions, fot o* 2meeyee: VER, aay Wonne yO a 


. 


< ite aang 

a a alc ea CR a a at iyite aed ae ror 

. ee ee be - 

ss 7 ee wohn sates = Tepe 2 ee 
7 tehe) ony Bl iia tad ion Tlute cig Sa he shite 
‘ot oo Sheet shteheor Smita 
erat Ve Hapbeetheiae LY 34) Gages cerca 
au): Sbing 1% “Bis 4: oth itie de ene { Sveae-a 5) 

+ oe’ ee 
ee 
i ot tive Cee en 
iia WL | ‘i ; =) oy an ‘ag a” ST3 See _ par tet? 
vA tata <3 ‘ pete ] il ‘oi ae mits gfe 27 need) : 
Anite mivzior se = lta: “ti ‘saw the tf etek eal aiiieege 
betsegts x wire watt bie wr “ibaa Ateie wet. ics rat «aie 
st mitt es | ath “scr itn AAR nee eigeoe Es an? ana y yee ee fi 


¢ — 
hee et ~mkt Pas els Bt tsa anes eto 


aa Suan 
a : 






54 


Separate items are sufficiently similar to the apparent right solution 
in the view of the examinee, he can be expected to choose both of them. 
it Sa suitierentiy larce number of examinees select this same pair of 
alternatives this joint selection will appear as a high correlation in 
the phi coefficients relating the two events, thus becoming "systematic" 
in that it would produce a significant statistical event. 

In the usual procedure used for scoring multiple choice 
achievement tests, only the right answers are treated as systematic in 


this sense, hence the requirements usually set for their performance. 





Mais study addressed itself to the exploration of the possibility that 
"most if not all of the answers given to multiple choice achievement 


tests are selected upon a systematic basis." This psychological 





hypothesis is the basic theoretical proposition proposed by this study. 

Since it is possible that wrong answers may influence the way 
in which items behave, and, inferring from communication theory, the 
suggestion emerges that each alternative may have more than one 
interpretation among a group of examinees... Thus, there are four 
possibilities, 1) that the systematic characteristics depend upon 
content; 2) that the systematic characteristics depend upon the 
advance classification as defined by the two process models of Bloom's 
Taxonomy and the Guidelines; and 3) and 4) that the systematic 
characteristics depend upon multiple interpretations as based upon 
content or process. The study could have no commitment toward any of 
these feats possibilities. 

For an exploratory study, Q.H.D. can be written at this point 
without further interpretation attempts. The developmental characteris- 


tics of wrong answers, their relationships. with personality variables, 
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with right answers, etc. exceeds the scope of this dissertation. These 


topics are, of course, legitimate areas for future research. 


The Sample Used 

The experimental test was administered to 277 summer school 
students in a one semester course in educational psychology at the 
‘senior level. The age group range of these students was from 19 to 55 
with the median age about 30, and most of the en ee having had some 
teaching experience. The overall group was subdivided by random 
assignment into two groups (Group A, Otel 39. students: endiGroupeR, of 1138 
students) . A t-test for independent samples based on the total-correct 
scores of the experimental test designed to confirm the equivalence of 


the scores of these two groups is reported on page 92. 
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CHAPTER V 


RESULTS AND THETR INTERPRETATION 

A somewhat different procedure to the one usually employed was 
adopted for this study. To begin with, the usual procedure for scoring 
and interpreting multiple choice achievement tests is to count the 
re. of items each examinee has correct. This procedure is sometimes 
modified by the specification, by various methods, of subtests of the 
total test. One of rey procedures is usually employed. Either 
the experimenter establishes the categories into which the items fall in 
advance of the test administration and then interprets his results on 
this basis, or he groups his results on the basis of some analytical 
procedure and then endeavours to interpret these groupings. Powell and 
Isbister (1969) used the former procedure, and Powell (1968) used the 
latter. In general, only right—-answer information is used. 

The present study endeavours DelLimicsdvance: Glassitication and 
Statistical classification, and also endeavours to use wrong answers as 
well as right answers in the pre ee of test results. As has 
already been indicated, very little research of the type just described 
is present in the literature. For this reason, this study can best be 
described as exploratory in which negative results are more ee to be 
indicated than are positive results. 

Each item on the experimental test had four alternatives, hence 
the study began with four variables for each item. The response matrix, 
therefore, contained a "one" (1) for each alternative selected ie each 
examinee; otherwise "zero" (0). Since the examinee was allowed no more 


than one choice per item for 30 items, each examinee would have a 
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maximum of 30 "ones" in the vector of 120 variables which represented 
-his selections, Because these selections were further restricted to one 
in “each group ‘or four, each variable 8s linearly dependent upon the 
other three in the same item. In order to remove these linear depend- 
encies, the performance matrix was partitioned into a right-answer 
.Matrix and a wrong-answer matrix. These latter two matrices were 
subsequently treated as being independent. 

In order to attempt to cross-validate the findings, the 
examinees were randomly assigned to two groups, Group A and Group B. 

AVI hevstatustical analysis done which was not related to cross-— 
validation was performed on the data from Group A. The relationship 
between the mean total-correct scores of Group A and Group B is given 
in Table 

In adda tion, Since! 2 relationship between advance classification 
and statistical ordering was being attempted, antadvVencetc lassi ica ton 
system was used separately for the items as represented by their correct 
alternatives and their foils. These classification systems were 
discussed in detail in Chapters II and III. 

Since an attempt to find a consistent interpretation of 
performance is being made, the examinees were randomly assigned to two 
groups so that the interpretations could be examined for cross- 
validation. Hence, the basic data for this study consists of two phi 
correlation matrices (see: Appendix A, Tables 32 to 39) for each of 
the two groups. The correlation matrices represent the intercorrelation 
between variables across examinees for .the right answers and for the 
wrong answers in each group. 


Finally, two achievement test scores were obtained for each 
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examinee. One of them was concurrent in the sense that the 

. experimental test formed a subtest in the mid term examination given in 
a one-semester course. The other waned Sad score was part of the 
final examination inthe same course. “This data were collected so that 
the predictive validity of the various interpreted categories and es 
predictive cross-validation could be determined: 

several steps were taken in each phase of the analysis. For 
instance, attempts were made to interpret the right answers on the basis 
of both factor analysis and inter-point distance cluster analysis. This 
step was followed by a detailed logico-semantic analysis of the right 
answer clusters in an attempt to interpret these clusters. 

A similar logico-semantic analysis was made of the wrong-answer 
clusters. 

Attempts were then made to cross-validate the advance 
classifveation, the a2ntverpreted clusters and a particular grouping of 
the interpreted clusters. 

Finally, the predictive validity of the advance classification, 
the interpreted clusters, and the grouped clusters was found. This 
validity was found in each case by using the right answers alone, and 
the combination of both right and wrong answers. 


The discussions which follow adhere to this sequence. 


Interpretation of Right Answers Using Factor Analysis 

Qn the basis of the advance classification there were two 
possible interpretations based upon either of two independent classifi- 
cation systems with respect to the right answers given by the examinees. 


One of these interpretations couldhawe been best described as a 
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"process" interpretation based upon classification of the items on the 
_basis of Bloom's Taxonomy. The other possible interpretation was 
"content" in which the items were classified on the basis of the 
information background required to answer them. 

An attempt was made to verify the possible existence of either 
or both of these two interpretations. The primary data for this attempt 
was a six-factor unrotated principal axis factor matrix derived from the 
phi correlations for the right answers. This matrix was rotated by a 
Procrustes solution to find the best fit (in a least squares sense) to 
two target matrices. The first of these targets specified a simple 
structure which indicated the way in which the items were classified 
using: Bloom's Taxonomy. The second target matrix AE raga a structure 


which indicated which items referred to each of the several reading 
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selections. ‘The matrix structure was not always simple sinc 
the items referred to more than one selection. 

Table 3 (see: p. 60) gives the target matrix and the pattern on 
the primary axes as related to the “process” classification of these 
items. 

It is evident from the results that the pattern does not 
reproduce the target matrix in any satisfactory manner. This finding 
suggests the conclusion that the advance classification of items using 
Bloom's Taxonomy did not give a satisfactory indication of the way in 


which each item performed. Table 4 gives the correlation between the 


primary axes in this solution. Table 4 is on page 61. 
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TABLE 3 


PROCRUSTES ROTATION OF THE ADVANCE 
CLASSIFICATION OF RIGHT ANSWERS 
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Item Bloom's Classification 
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a. The numbers in Italics had the highest loadings. 

b. Only those loadings with an absolute value of 1.00 or greater are 
shown. 

c. The items which are starred (*) approximate the target. 
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TABLE 4 


PROCRUSTES ROTATION OF THE ADVANCE 
CLASSIFICATION OF RIGHT ANSWERS 


A A 








fi En Pil IV V 
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I -.96 1.00 
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Vv 095 SALA ON. -.94 090 1.00 


The primary axes (Table Ly) were highly correlated, suggesting 
that by this classification system, there may be only one factor 
present. 

As indicated on page 88 an identical procedure was used to 
examine the data for the possible presence et e"“eomvent' facLors . 

Table 5 (see: Qe 62) gives the target matrix and the pattern on primary 
LOL this Pprocrustes eer | 

The fit of this matrix to the target based on content is only 
slightly better than for process-oriented advance classification. 
Factor V loading with items 21, 24, and 25 show a nearly simple 
structure which coincides with the target matrix. These three items 
also formed a unique cluster on the basis of the cluster analysis 
conducted later in the study. It is possible, however, to give a 
process interpretation to the cluster which may mean that this pattern 
for content might be coincidental. 


Aside from these items the pattern did not reproduce the target 
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matrix in an acceptable manner. Content would seem to be only slightly 
_better than process as a means of classifying items in advance of their 
use. 


Table 6 shows the correlations between the primaries. 


TABLE 6 


PROCRUSTES ROTATION OF THE INFORMATION 
CONTENT OF RIGHT ANSWERS 





Correlation between Primary Axes 


iD pl ek IV V 
it 1.00 
nh 14 1.00 
cre -.76 =i 1.00 
IV ~.52 ole Mel: 1.00 
V 30 =, i -.46 -.08 1.00 


The Interpretation of Right Answers Using Cluster Analysis 

The negative results just reported suggested the need to search 
fora multiple interpretation possibility... Hence, a cluster analysis 
procedure which normalized the same factor matrix by rows as was used 
in the two solutions just given. The normalization involves dividing 
each member Of aerownin sine Lactor een by the square root of the 


communality. Since this value is the length of the vector given by the 
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row of factor loadings, this division raises this length to unity (one). 

The procedure then calculates the interpoint distances from the 
ends of the vector pairs, surface-to-surface, across the hypersphere. 
The square of this distance is the sum of the squares of the differences 
across the rows taken in pairs. This interpoint distance is then used 
to form clusters in which the within-cluster distances are minimized as 
indicated by the formulae given in Table 41, p. 151. The clustering 
begins with as many clusters as variables and ends with all variables in 
ones¢luster.| In addition, there is a unique cluster solution for each 
fecioOr solution which might be used or with the inclusion of each 
additional factor, The experimenter, therefore, was left with the 
problem of determining which of many possible solutions to Se 
Repeated attempts suggested an advantage to the process classification 

if Bloom's Taxonomy. It wag decided to consider @ cluster to have 
recapitulated the advance classification if it contained more members 
irom one particular advance category than irom any other category. The 
solution which gave the best recapitulation was then sought, by itera- 
tion, for both right and wrong answer clusters. The cluster was 
assumed to be identified on the basis of the recapitulated category. 

For the right answers, in this sense, the best solution was 
decivedyirOmean Unmovated principal axis facu0r Matrix Of Six Zactvcrs. 
In this solution twelve of the thirty items recapitulated in the 
clusters. This result is four times better than the Procrustes rotation 
tOfit content just reported. Since this was 40 per cent of the items, 
in spite of multiple interpretation possibilities, the result was 


reasonably satisfactory. 
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An examination of the data suggested that the first unrotated 
Bactom might bewe “dittienlty" factor... “Table 7 on page 66 expands upon 
this relationship. 

Le) 2 ae the value of the loading on Factor 1 seems to be 
epouu 50 per cent of the valmeUolthelditficalty. The correlation 
between these two variables was r = .65. 

Docs ethics tindinesceri.ously disrupt the use of the six,factor 
solution? Table 8 on this page compares the interpoint distance 


clusters as determined with and without Factor 1. 


TABLE 8 


RIGHT ANSWER CLUSTERS 
WITH AND WITHOUT 


FACTOR 1 
1 Me itt sa Ae ak oS. —— = 
' With Factor 1 iN - Without Factor 1 
Siete 2 es me 2 Boca 20s LP OREe Ces 
1 1! 
5 © ae 30 it S 30 
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a. The numbers in Italics displaced with the removal of Factor l. 
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TABLE 7 


RELATIONSHIP BETWEEN ITEM CONSISTENCY, ITEM 
DIFFICULTY AND ITEM FACTOR LOADING ON 
UNROTATED FACTOR 1 


Item Internal 
Number Consistency Dar heuUle y) Factor 1 
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5) 304 0 345 176 
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The Italics in Table 8 on page 65 indicate that seven items move 
by dropping Factor 1. This fact does not seriously affect the replica- 
tion of the advance classification in the new solution. Also, most of 
the items uae move to a new cluster do not follow the general rule 
that the difficulty be roughly twice the factor loading. For these 
reasons, it was decided to retain the complete six factor solution 
throughout subsequent analyses. 

wince a Procrustes rotation was used to determine how well the 
advance classifications fit. the data, it was reasonable to use the same 
procecure with the cluster analysis data. Table 9 (see: De 68) reports 
the target and pattern matrices in solution. 

Mhe-patvrern-on -primary ain-Table-9 -bs.-a -good sn rast of the 
simple structure of the target matrix. Furthermore, if the values of 
W7' i.e.) the Square root of the communality from the six factor matrix) 
are taken into account, then the pattern seemed to fit even better. The 
cluster analysis was produced from the interpoint distances derived from 
a normalized matrix, For this reason, to find the length of the largest 
vector approximating the overall length of the vector in the unrotated 
Six factor solution add to the acceptability of the solution. | 

in Short, the cluster solution was a far better fit to the daa 
than either of the two methods of advance classification. The independ- 
ence of the interpoint distance clusters from rotation problems also 
reinforces the acceptability of this approach. Table 10 (see: p. 69) 
reports oS correlation between the axes. 

The relatively low correlations between the axes in Table 10 
further strengthens the support for the use of interpoint distances as 


an analytical technique for the problem under examination in this study. 
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TABLE 10 


PROCRUSTES ROTATION OF THE 
CLUSTER ANALYSIS OF THE 
RIGHT ANSWERS 


Correlation between primary axes 





C C 

tb 5 46 wD Cg “0 
Cy 1.00 
Si ager 1.00 
Ce = olf we 100 
Co meee “pon -.03 OO 
Ce 06 -.29 - 03 -.08 1.00 

c ey 

C39 oe «20 o15 ee 209 a 


The largest correlation between axes given on Table 10 was -.31 which 
represents an angle of more than Tau between this pair of axes. . It is 
possible, therefore, to state that interpoint distance clusters produced 
a solution which was independent of the usual rotation problems and 
which was approximately orthogonal. The procedure gave a very satisfac- 
tory statistical representation of the data. On the other hand, the 
failure of the advance classification systems to render interpretabil- 
ity left the researcher with the problem of interpreting these clusters. 
Finally, a matrix which is in fact categorical in the sense 
given on page 47 would be expected to have low correlations between the 
Aap axes for a Procrustes rotatiom. When the dimensionality of this 
categorical matrix has been reduced by factor analytic techniques, 


before rotation, the best fit from a Procrustes rotation should display 





70 


the retained lengths of the vectors (ise. the square roots of the 
communalities) as the loadings of the pattern on the primary axes. 

Also, the pattern: on the primary axes ey ania display a structure similar 
to the a matrix, All three of these properties must be present for 
the inference - be made that the principal axis matrix is categorical. 
It is evident from Tables 9 and 10 (pages 68 and 69) that all these 
conditions were met, strongly suggesting that the cluster analysis 
technique gives a good.categorical solution to the six factorsdata. The 
low intemal consistency was thus explained, suggesting profile scores 
from the clusters may better describe the data than total correct scores 
alone. Finally, the close match between the original vector lengths and 
thesnacdor Joading iin the Procrustes yrotation sand the near orthosonality 
of the factors suggests that this solution clearly identifies the homog- 
eneous subtests of the right answers. These findings contradict the 


apparently poor showing of this test in the usual analytical setting. 


The Meaningful Interpretation of Item Clusters 
With the failure of the advance Bee rica ca the multiple 
interpretation hypothesis was the only altemative to investigate hence 
some reasonable common ground was needed for each cluster if answering 
was systematic. The first possibility which had to be either confirmed 
or eliminated was that the clusters were sufficiently strongly content 
oriented to suggest content as a possibility. The only cluster in which 
content was at least a strong contender to process was the Cluster Cho 
containing items el, 24, and 25. All these items were based upon reading 
selection number five (Progress), but also, an three were classified as 


Analysis items. A closer look was taken at this clusver, along with all 


the others (see: Appendix Cis This look supported the process 
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interpretations over the content interpretation.. Thus, it was reasonable 
to reject content as a basis for the interpretation of any of the right 
answer clusters, 3 

The misfit items in the identified clusters were examined by 
logico-semantic (structure-meaning) analysis to determine if they could 
be reasonably reclassified in common with the recapitulating category. 
Success by this procedure would lend support to the multiple interpreta- 


tion hypothesis. All misfit items in clusters C C, except possibly 


eee 
item / coulldsbe reclassified to-fit’ the overall classification ‘of these 


clusters, adding three or four items to the original twelve, 


The logico-semantic analysis of Cluster C 


8 Revicn Come amnle 


formation of an inductive structure within a particular reading selec- 
tion could possibly be the basis for synthesis items, For this reason 
Ce was Classified as Synthesis, adding another three items to the 
support of the multiple interpretation possibility. 

This procedure gave 19 of the 30 items, or 63 per Gent, er) the 
items a reasonable classification based upon process. Since the inter- 
rater reliability was only moderately high (r = 33) and since the veross-— 
validation based upon interpretable reasons found in Powell (1968) was 
only 64 per cent, this level of recapitulation can be considered to be 
Sa uLsorac LOry > 

For the remaining four clusters the interpretation was ambiguous. 
Cluster Cy containing items 15 and 18 seemed to be simply poor items, 
Cluster Cy seemed to involve implication or extrapolation from the rel- 
evant content suggesting comprehension but in the absence of better 


definitions for strategies it was safer to call this cluster ambiguous 


(see: p. 197). The remaining two clusters (C., and C) seemed to be 
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strongly influenced by the nature seithe foils in the items which tended, 
in general, to lower them to Comprehension items; but each had some 
Analysis characteristics leaving their ei esc caecn ambiguous, which 
seemed to further support the multiple interpretation hypothesis, These 
decisions are summarized on Table 11, page 73. 

Trying to make Synthesis items by combining two or more reading 
selections proved unsuccessful for a number of reasons, suggesting the 
need for more research on this method. 

Thus, the results of the logico-semantic analysis of the item 
clusters suggested reasonable support for 1) the multiple interpretation 
hypothesis, 2) the transcendence of process over content, and 3) the 
suggestion that foils influence item performance. | 

The summary just presented is supported by a detailed discussion 
given in Appendix C (see; pe 186 ff). The details are also presented, 
first, because it was felt that the effective reclassification repre- 
Sented evidential support for the multiple interpretation hypothesis, 
and second, because subsequent researchers might find value in an 


independent evaluation of the logic behind the conclusion of this study. 


The Meaningful Interpretation of Wrong Answer Clusters 


The tables for the factor analysis of the wrong answers were 
very large since they involved 90 variables (see: Appendix A, Tables 34 
to 39) .° The analysis of the right answers showed that interpoint 


distance cluster analysis gave a good representation, in a statistical 


ome phi coefficients in these tables are in the following 
sequence: variables 1 to 30 represent 1D, to 30D,; variables 
91.60 60 represent 1D. to 30D,; and a 61 8 90 
Z Z 
represent iD, to 20D .6 
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TABLE 11 


CLASSIFICATION AND MEMBERSHIP OF 


RIGHT ANSWER CLUSTERS AS 





Ttem 


Cluster 

Label Membership 
of ie 2 
C., 3 LY? 30 
C, 4 15 6 
Cy 5 I aS, 
o) Zoe 2 BIN 25 
2 
Ce. isa 
Co 10 We eels) 
Ce PZ 20 26 
oe 25 18 
Cio ea 25 


DERIVED FOR GROUP A 


Advance 
Clas samrea aon 


28 Analysis 


Analysis 
Analysis 


29 Evaluation 


24 Analysis 


Interpreted 
Classifrveation 


Analysis 
Ambiguous 


Ambiguous 


Analysis 
Analysis 
Evaluation 
Synthesis 
Ambiguous 


Analysis 


a. the mimbers in Italics were all in the category named in the 


advance classification. 


/ba ‘The Minterpreted classification! ofmeachielusterris igiven iain 
Appendix C beginning on page 186. 
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sense, for those data. Finally, the interrater reliability was lower 
_for wrong answers than right answers, suggesting that the advance 
classification of foils would be less Teery to reappear in the data 
than the right answer. For these three reasons, the interpretation of 
the wrong answers began with the cluster analysis. Attempts to fit the 
_unrotated principal axis factor matrix to the advance foil classification 
and to the foil content were not made. ‘There was no reason on the basis 
of the characteristics of the results of the cluster analysis to assume 
that the results of these two preliminary steps would have been 
substantially different for the wrong answers than they were for the 
right answers. 

The best replication of the advance foil Sis Ae which 
could be found in the cluster analysis involved a 25-factor solution to 
the phi correla ti.0m matrix of the wrong responses, and 15 clusters of 
foil in tiieveclitiaon This replication placed 28 of “the 90 Torls 
(or 31 per cent) into clusters which might be considered equivalent to 
the categories of the advance classification on the basis of the most 
irequentiy, occuring category of Torl in that cluster. ‘This proportion 
(31 per cent) was not quite as good for the wrong answers as the | 
corresponding proportion (40 per cent) was for the right answers. 

im wetting Ghis best replication,’ the same procedure was used 
for determining the number of clusters as was used for right answers. 

All 90 foils were clustered. Once the best replication was 
founds israel, those foils for which the selection ratro was’ Tess vhan 
.06 were dropped from further consideration. This precedent was 
established when the interpretation of the right answer clusters was 


being made. The result of the dropping of foils of low selection ratio 
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was to reduce the number of foils Lae consideration from 90 to 60. 
Of these 60. foils, 18 foils (or 30: per cent) continued to meet replica- 
tion requirements. The proportion is essentially the same as before. 
the antemratem melisbalitytom foil classification’ (m= 62) 
was not as high as for the right answers. It is possible that this 
figure might have been considerably lower, had the right answers to the 
items not been clearly indicated to the raters at the time of the rating. 
The procedure of rating involved comparing the foil, stem, and right 
answer relationships to the definitions of foil categories as listed in 
Chapter [11.2% became fairly evident that atleast some of the foils 
might be placed quite reasonably into several different categories. The 
problem of multiple classification of foils will be dealt ah in more 
detail in the interpretation of the wrong answer clusters. 
(see: Appencann® wp. 209 22). 
Briefly, the advance classification of foils were arranged into 
three or four possible general classes. These general classes were 
1) Strategy Errors, 2) Misreading, 3) Misinformation, and 4) Other. 
The Other, (0), category was for foils which for some reason could not 
be readily classified into some established category. In the ease 
mental Test this category referred primarily to the foils for items 19 
to 24 inclusive. In these items a different item format was used to 
that of the remainder of the items. It was not possible, in advance, 
to know whether or not this difference in item format would influence 
the ne which the foils behaved statistically. 1t was assumed in the 
ier ores aaa of the clusters that "0" type foils which were found to 
be in reasonable association with foils of specified categories were not 


influenced in their behavior by the item format. If the Other (0) type 
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foils formed their own unique clusters, these clusters were assumed to 
“represent catezories of foil not identified by the advance classifi- 
cation of foils. Two such categories pe ee in the data. 

The specific categories used in the advance classification of 


foils were -as follows: 


Name Symbol 
1. Overgeneralization OG 
2. Oversimplification OS 
Ba2 Substitution Sub 
4, Inversion Inv 
5. inveled Assumption . IA 
6. Irrelevancy tea 
7o Common Misconception CM 
8. Word-Word Link WW 
oO. sransposiiion . Oe 
10. Redefinition of Terms RT 
iis 6Other O 


A cluster was identified on the basis of the most frequently 
eccurring foil of a particular advance classification in that ehuster. 
The identification given in Table 12 (see: p. 77) is based on all of 
the foils in each cluster before low selection ratio foils were 
eliminated. This identification was used as a starting point for the 
attempt at meaningful interpretation of each foil cluster. The final 
meaningful interpretation of each cluster is also given in Table 12. 

Once again the members of each cluster were examined in an 
attempt to determine the common basis upon which these foils clustered 


together. Particular attention,was.paid to the foils which were not in 
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the common advance category present - the cluster. In addition, the 
evident influence of foils upon the performance of an item led to the 
possibility that the interpretation of a foil by category might depend 
upon: the point of view of the interpreter. The low interrater reli- 
ability suggested this possibility. For at least some foils several 
(classifications may have been reasonable. In thie case. the foil cat- 
egories would not be independent or unambiguous. If foils are only 
interpretable after a test has been given, then these interpretations 


become specific to the particular examinees upon whose performance the 





interpretations were based. That is, the expectation would be that 
where multidefined foils were concered, the proportion of cross- 
Velidawino orl. On 2 scolveter_tor-clueter bastsswould pe_low. 

A procedure similar to the one employed for item clusters was 
used with wrong answer clusters. For complete details see Appendix C 
(see: ) fe, 209 ene 

The possibility of multiple interpretations of wrong answers was 


most clearly illustrated ag the case of foil 2) Beverly, the, Classe ta — 


1° 
Catazon vot this foil was initially set as Substitution’ (see:. p. 156) 
because it substituted a conjunctive for a disjunctive relationshipe 
However, it is possible to look at these relationships as Overgeneraliza- 
tone Or -OVeTeImpLiiication as the Ciscussion an Appendix C™indicates 
eae) ce 

‘Apparently, considerably closer analysis of the logico-semantic 
relationships within items and their components would prcbably reveal a 
range of possible interpretations for each item, This multiple 


interpretation effect is consistent with the findings using right 


answers. For foil clusters, the lower recapitulation left five clusters 
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containing 28 foils from the 60 that remained classifiable because of 
-the advance classification of their members. Of these, ten foils did 
not have these classifications in advance, but eight of them could be 
reasonably reclassified into the category of the total cluster. 

Only one cluster (w.) which contained nine foils could not be 
-Classified by logico-semantic analysis although one or two more of these 
were shaky at best. One cluster (W.) was classified Common Misconception 
on the basis of logico-semantic analysis because the characteristics of 
this cluster were reminiscent of the findings of Powell and Isbister 
(1969) thereby suggesting consistency between two independent groups on 
the same test. Three categories (NS, 01> 0.) were added to the Guide- 
lines because of the clustering as well as the logico-semantic analysis. 

Clearly, the Guidelines were not exhaustive and were not mutually 
exclusive, as already anticipated py the establishment of the "Other" 
classification, and by the multiple interpretation hypothesis. Because 
of the support for this hypothesis it is likely that the interpretations 


formulated should be confined to the group upon which they were derived. 


A Possible Hierarchy of Foil Categories 
During the above analysis it became evident that foil categories 
were, to a degree, She tele ae Hor ansvance, 1021 2D, could be 
¢lassitied as 0G, 0S, or Sub; ‘In only a tew cases did reclassification 
within a wrong answer cluster of specific foils seem unreasonable. No 
attempt was made to exhaustively reclassify foils. Instead, the attempt 
Wiz to reclassify specific foils to fit the pattern which seemed to be 
evident within the entire cluster as acrived trom Group A." Tabie? 13 on 


page 80 summarizes the reclassification which occurred for each foil 


class in each of the wrong answer clusters. 
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TABLE 13 


RECLASSIFICATIONS WHICH WERE 


MADE OF SPECIFIC FOILS 


ne 





Gitsmene Advance Classification heecliacssiiaes tavon 
Ws Sub became OG 
We sub, OS and OG became CM 
my Ossand. OG became Inv 
We Tnvs oub..0G; Ler became NS 
We inv and Sub became IA 
WS Vivgrey Ab igs 25, ator poyble) became OS 
Wa irr and Os, sue,s CM became WW 
Ws 3 OG became 0., 


Since the O class in Table 38 was treated essentially as though 


it were unclassified it was omitted from this table, as were foils which 


did not classify within their respective cluster. 


Table 13 lists the 


Advance Classification of the foils in each of the’wrong answer clusters 


which were different from the final classification of that cluster and 


the final classification given. As such it summarizes Tables 53 to 66 


inclusive. 


Put the other way around, the substitution category of foils 


disappeared by reclassification as OG, CM, Ns, 


ano sila. 


Besides 
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retaining a cluster for itself, OG also reclassified as CM, Inv, and On 
OS also retained its own category but also became WW, CM, Inv, and NS. 
Dimilariy. ~ome of the Inv foils reclassified to Noa Dan -end OS. 
Finally, some Irr foils became NS, OS, and WW. Figure 2 presents these 


changes diagrammatically. 





FIGURE 2 


FOIL RECLASSIFICATION PATTERN 


° 


In Figure 2 there is a double headed arrow between OS (Over- 


simplification) and Inv (Inversion). This arrow means that at least one 
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82 
OS foil was reclassified as an Inv. and at least one Inv foil was 
reclassified as an OS (Oversimplification). The other arrows in the 
figure can be interpreted in the same way. 

Sub foils disappeared, and hence are dropped from this diagram. 
From the analysis of the right answer clusters it became evident that 
- CM and WW type foils tended to lower the level in the taxonomy of an 
item ane OS and OG foils tended to raise it. 

From™the reclassificatton pattern in Figure 2 0S foils ‘seem to 
be pivotal in the sense that it was most frequently involved in changes 
(from or to). Furthermore, NS, Inv, and OG may be higher order foils, 
and CM, WW, and Irr of lower order foils on the basis of the changes 
these foils seem to cause in the reclassification of right answers. 

Thus the general order of a possible foil hierarchy would seem 
to tollow a vertical axis in Figure 2; with the lowest level at the 
bottom. 

0, and IA’ foils are ambiguous in this pattern because they have 
one-way linkages only. 

Other evidence to support ie possibility of dan hierarchy of 
foils should be found before this property of foils can be considered to 
be established from the data. Any set of random variates can be ordered 
on the basis of relative magnitude into an arbitrary hierarchy. Two 
random variables will be uncorrelated. If, however, two correlated 
hierarchies can be produced from two independent variables, this produc- 
tion would suggest that these two variables may be functionally related. 

One possible source of such an’ arbitrary hierarchy is to consider 
the average total-correct score of the respondents selecting each inter- 


preted foil as listed in whe Appendix in Table 40. The average of these 
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averages could be found for each interpreted wrong-answer cluster. This 
average total-correct score for all:the members of a particular cluster 
may reflect a systematic preety Pele tis the examinees which may 
relate the kind of "error" made to the total-correct score achieved. 

If foil selection pattems reflect a functional relationship 
between "errors" and total-correct score, the rank of a foil (from one to 
three for high to low) within an item based on the average total-correct 
score should also reflect this functional relationship. If each itém is 
independent of each other item, the average within item rank of the foils 
across a cluster should also be independent of the average total-correct 
scores of the foils in that same cluster. This is a reasonable assump- 
tion since only in item cluster C. does there not seem to be a close 
relationship between item clusters and foil clusters for the items in 
that cluster. An exception to this expectation would arise in the even 
tia Uieresas." 10 Tact. @ functional relavionship betweerm the kind’ or 
error and the totali-correct score. In this latter case, both these 
procedures should produce roughly the same hierarchy. 

In addition, this hierarchy would be expected to reflect the 
patter which seemed to be evident on the basis of the patter of re- 
Bij Vexete inked Of Toils, the reclaSsiiication of items as anfiluenced by 
forts, and the indications which also arise from other reséarch into 
wrong-answer patterns. 

Since this part of the study is exploring the possibility of a 
hierarchy among the foil categories, the rank order of the two variables 
just described was determined. That is, the rank order of the average 
for Lach interpreted foil cluster of the average total-correct score 


associated with each foil in that cluster was found. This procedure 
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established an arbitrary ordinal relationship among the interpreted foil 
- clusters. A ranking of the average within item ranks was also estab- 
lished. If there is no functional relationship between foil type and 
total-correct score, the rank order correlation between these two 
ranking systems should be near zero. That is, the within item and 
between item characteristics would not be related to the average total- 
correct score on each foil in a functional manner. Table 14 shows the 
ranking of foil clusters by two independent methods (see: Ds 85). - 

Rach of the two ranking systems in Table 14 tend to support the 
more general ranking eee suggested by the reclassification pattern. 
The comparison between the two ranking systems gave a rank order correla- 
tion of pe = .68 which is significant (p = .01 for two-tailed test for 
N = 2). 

Other findings support this hierarchy. For instance, Powell and 
Isbister (1969) tound that TA Fouls correlated negatively to Synthesis 
items. Foils in this category would be expected to have the fairly low 
Teankavnle Suudy suggesis tor TA foils. On the other hand, placing this 
foil type just above the content-linked misreading categories may be 
placing it too low in the hierarchy. Similarly, RT foils seemed to be 
content-linked, and would be expected to have a low rank for this reason. 
Support for this extrapolation was also evident. The tendency for [rr 
foils to distract middle and high level performers has already been noted 
in Powell (1968) and Powell and Isbister (1969). Thus the ranking by 
average total-correct score which seems to place Irr foils in sixth 
place would seem to be too low. Similarly, the ranking on a within item 
basis as sharing top place would seem to be too high. 


By itself, the ranking.on total-correct scores would seem to be 
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TABLE 14 


RANKING OF FOIL CLUSTERS BY TWO 


INDEPENDENT METHODS 





Interpreted Rank by total-correct Rank by average 
plotl+ cousters averages within within item rank 
clusters by cluster® 
Cluster Interpretation Average Rank Average Rank 
Ws OG 12.0 0 2.4 Ay, 
) en 
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a. The within item rank includes the right answer and does not drop 
any foils. . 


fairly closely related to an hypothetical foil hierarchy on a between 
item basis. Similarly, the within item ranking would seem to be more 
closely related to the influence of foil categories upon the items. 
However, these two variables are obviously related, as indicated by the 
significant rank order correlation between them. These results RES 


overall systematic answering which influences the statistical outcomes 
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of both within and between item events. 

Perhaps the most reasonable See ren for the hierarchy would 
be to consider both events to be interdependent. The simplest approach 
in this case, is to consider.the average rank of Pee eee separate 
ranking systems used and to rearrange the foil categories accordingly. 

The resulting hierarchy would then reflect the influence of both between 
and and within item events. Table 15 gives the results of this 


procedure. 


TAGLE 145 


RERANKING OF FOILS 


BY AVERAGE RANK 
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This rearrangement put RT nearer the bottom, OS nearer the 
_ middle (as its pivotal position suggested), and Irr nearer the top than 
the average total-correct rank, which seems to be reasonable relative to 
the available evidence. This new reordering has not been used in sub- 
sequent analysis because of the problem of the relevance of the within 
item ranking to this hierarchy. Further research is needed before the 
most probable sequence in the hierarchy has been established. 

The lowering of the performance level of synthesis items in C, 


SuOmCnty vit, Clwond RU ieals funiherssupports this hieremchy., “Foil 


. 
17D, was reclassified as OS in the logico-semantic analysis of the wrong 
answer cluster. The pivotal position of the OS category would seem to 
support the "double-strategy" interpretation of C5. Foil 4D, was 
changed from OG to CM, which suggests that this foil, and +D. which was 


unclassifiable may have combined to lower this analysis item to a 


comprehension level. Right answer cluster C, may, therefore, be a 


2 
comprehension level cluster and not a double-strategy cluster as 
suggested earlier. These suggestions are too tentative to alter its 
"undetermined" classification. 

Purcties support tom this Tou hierarchy may be found by taking 
the new foil ranks and determining from these the average foil rank of 
each right answer seed Table 16 (see: op. 88) gives this 
information. 

Table 16 shows three additional characteristics which may add 
suppor) to the concept of a foil hierarchy. First, the order conforms 
to Bloom's Taxonomy with Co) the apparent split strategy cluster falling 
between the analysis and undetexmined categories. Second, evaluation 


fell out of order as occurred in the Kropp, Stoker, and Bashaw (1966) 
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sxudys” Third, *the+ cluster’ C if legitimately classified by one foil, 


9! 
fell at the bottom. The pyeitieies rotation match-to-content suggested 
that this cluster might be a Be eyes heen cluster, In this case, 
its location was also reasonable. 

Further support for the hierarchy can be found by determining | 
the rank order of the right answer clusters in several ways. When the 
between and within item foil ranks, as given in Table 14, page 85, are 
used to calculate the right answer cluster rank, a highly significant 
correlation is found (p = .90, p <.001 for N= 9). This finding 
sioe both the hierarchy and the apparent influence of foils upon 
item performance. All of the other possible rankings produced correla- 
Leonsewireh= were noiMsionitacant.ancludins 4 comparison between ranking 


by average foil rank and average total-correct scores. 


re eee nn 


Another interesting finding in the results just reported is 
the fact that the average difficulty of each cluster:seems to. be: un- 
cOrrelatea witw the rank of ®each clusterg This correlation is p = 05 
and does not change much if within item rank is used or if the composite 
rank is used. This finding would seem to contradict the subsumption 
characteristic assumed to be part pe toons Taxonomy e 

From the data available in this Study; another test of this 
subsumption hypothesis can be made. If the subsumptive property hoids, 
successively higher members of the right answer hierarchy should be 
cbliquely related. The Procrustes rotation gave a good fit to the 
target for the cluster analysis (see: De Gry In general, the 


relationship among the primary axes would seem to be orthogonal. 
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However, the sampling distribution of these correlations is unknown so 
that their statistical significance of these correlations cannot be 
determined. In this case, ee actual values of these correlations, 
arranged in order on the basis of the hierarchy of right answer clusters 


may reveal a systematic pattern. Table 17 gives this data. 


TABLE 17 


POSSIBLE SYSTEMATIC OBLIQUITY 


BETWEEN ORDERED CLUSTERS? 
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This table is based on Table 10, page 69 
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If the point is stretched to the ultimate and correlations 


greater than or equal to an absolute value of .15 are considered 
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oblique (r > Pas) e there may be seven (7) of these fifteen (15) 
-relationships that could possibly be considered as oblique. In this 
case three (3) of these relationships are oblique along the diagonal of 
Table 42, but all are within the analysis grouping. Four of these 

seven are found among the six of the analysis grouping. The other eee 
are among the nine relationships outside the analysis grouping. The 
highest of these correlations still represented an angle larger than 70° 
(r = si -Jl and the greatest proportion of these slightly Bene eal 
angles are found among analysis clusters. Such slight pattern as might 
have existed did not seem to be too important outside of the analysis 
group of clusters. The relationships among the clusters did Bic seem to 
support this assumed subsumptive relationship between levels of the 


Taxonomy. Kropp et al (1966) found that the order from the simplex 





analysis did not consistently support this subsumptive property either 

(see: Deeb). atm short, the lack of correlation between average item 

difficulty and cluster order would seem to be further evidence towards 

the probable refutation of the assumed subsumptive relationship between 
the levels of Bloom's Taxonomy. 

The fact that the correlation between the order of the right 
answer Button and the average total-correct score was negative 
suggests a general tendency for examinees to do better on low level 
items, which suggests a ceiling effect present in this difficult test. 
The test was also shown to be difficult by the average total-correct 
score, Wich was 12.19 Oux Of @ POssibile SO" tene for Grown A. Ths 


possibility found further support in the cross-validation part of the 
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Cross-validation of the Analysis 

The multiple interpretation hypothesis supported thus far, and 
the apparent foil hierarchy, suggested two possibilities with respect to 
cross-validation, First, the attempts made to cross-validate these 
findings might not be successful. Other evidence for systematic 
responses may, therefore, have to be sought, such as the hierarchy, 
ees might be supported more strongly in the cross-validation than the 
individual clusters. 

The total-correct means and variances for each of these two 
arenas on the experimental test were used in a t-test for independent 
samples in order to confirm the equivalence of test scores of these 


two groups. Table 18 contains this data. 


TABLE 16 
COMPARISON BETWEEN GROUP "A" AND GROUP "B" 


ON TOTAL-CORRECT SCORES 
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On the basis of the results given in Table 18, page 92, the two 


€roups would seem to belong to the same population so far as the 


experimental test means and variances are concemed. 


Cross-validation of right answers. Three different comparisons 
were used in the cross-validation of item clusters, 1) Those right 
answers from the advance classification which were used to help identify 
the item clusters were checked against the clusters of Group B. 2) The 
Clusters which occurred from the answers of Group A were compared with 
the clusters from Group B. 3) The right answer clusters were divided 
into three groups of about equal size based on their average foil rank, 
end thisidayision was cross-validated.. Table 19 gives the first two of 
these eisaciscns: The-code C€! is teed coenatet to Group B's clusters, 
(see: Pe 94) . : 

The rule used in Table 19 (see: p. 94), once again, was the 
most frequent repetition of items within clusters for Group A and Group 
Bem Onlytiour of the 12 items (33 pér cent) which were grouped by the 
advance classification and retained this grouping in the clusters were 
found to cross-validate in Group B. Thus, the advance classification 
holds up about as well (or as badly) in cross-validation as it did in 
the clustering. 

Comparing cluster by cluster, there were ten items (33 per cent) 
in the clusters from Group A which occurred in equivalent clusters for 
Group B. (Glusver i contained two members from C. but also had all of 


D 


Ce it (items 15, and 18) which leaves the definition of C ambiguous o 
ws 


In any case, it was evident that the clusters themselves did not 


cross-validate any better than the advance classification. 
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TABLE 19 


CROSS-VALIDATION OF THE RIGHT ANSWERS OF 
GROUP A BY GROUP B FROM THE ADVANCE 
CLASSIFICATION AND THE ITEM CLUSTER 


1 
Group A : Group B 
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1 1 ! 
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ao sine Mumeercs tn Italics, cross-validate the advance classification. 


be. The starred (*) numbers cross—-validate the item clusters. 


Table’20"(sée: Dp. 95) has the right answer clusters arranged by 
their rank based on the average total-correct score as given in Table 16 
(see: Po 88), and arranged into groups of three clusters with Cy (items 
15 and 18) dropped. The cross-validation was then repeated. 

Instead of 33 per cent there is now 57 per cent cross-validation 


although with only three groups to match instead of nine. An increase 
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of this sort would be expected. Mose interesting is the distribution 
of the shifts between Group A and Group B clusters. If the pattern 

for Group A is taken as a reference and the order is considered to 

be fixed, a shift is considered to be + 1 if the mobile item in the 
Group A cluster is found in a Group B cluster associated with items 

ener clusters ni sher foryGroup) Acewlhus; a’+)loshifisoccurred, foriitem 13 
“le Os elusterédawn then temr 5 sin oh waere the latter is in Cie pimilerly, 


item 10 in Cy would represent a’'- 1 shift. -Table 22 summarizes these 
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Table 22. (see: p. 96) shows that the mean absolute shifting of 
items between the two groups was slightly more than one place in the 
hierarchy (s = 1.11). This shifting was about one third of the shift 
expected if the items had randomly rearranged from Group A to Group B. 
If all possible shifts were equally probable, the mean shift would be 
3.24, In fact 22 of the 28 “items shifted two steps or less, which 
accounts for 79 per cent of the items, 

A possible explanation for these shifts can be found in the’ 
hypothesized multiple interpretations. This hypothesis would suggest 
that in spite of the homogeneity of these two groups based upon total- 
correct scores, these two groups were obviously not homogeneous when it 
came to clustering of the items into item-homogeneous subtests, The 
clustering was based upon correlations which were sensitive to marginal 
totals. The stability of these marginal totals could be expected to be 
affected by the range of interpretations of the items within the groups 


concemed, This shifting could, therefore, be a product of the 





heterogeneity of the examinees and the effectiveness of the item in 


communicating a limited range of possible interpretations. 


Cross-validation of wrong answers. An identical procedure to 
the one used for right answers was’ used with the wrong answers. 
Table 22) (seet& ps 98) gives the cross-validation between the advance 
foil classification and the individual foil clusters between Group A 
and Group B. 

In Table 22 we find that of the advance classification only 
three foils out of 16 (or 19 per cent) cross-validate as compared with 
_the 16 out of 60 foils (or 26 per cent) which help to identify the 


clusters. Once again, the advance classification and the clusters 
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cross-validate in about the same proportion. Wrong answers seem to 
_cross-validate about as well as right answers. 

The third comparison, once again, was between foil clusters 
grouped into three groups based upon the average foil rank. Table 23 
(see: p. 100) gives this comparison. 

In Table 22 the high group again nos eiralahetes best using the 
hierarchy, With We being dropped as uninterpretable, 27 of the remain- 
ing 51 foils (or 53 per cent) cross-validated compared with 16 out of 
28 (or 57 per cent) for right answers, and for a combined total of 43 
out of 79 alternatives (or 54 per cent). 

Although the wrong answers showed a wider range of shifts, 30 
‘foils had a shift range of > 3 or'less (or 59 per cent) compared with a - 
probable s = 4,83 for random shifts. Foils, though less stable, showed 
the same trend toward stability as right answers. Once again the mult- 
iple interpretation hypothesis could account for the lack of stability. 

The joint cross-validation combining right and wrong answers by 
group were: high group 21 out of 28 ( ome 75 per eee middle group 14 
out Of 25 (or 50 per cent); and lowssroup @ out of 23 (or 35 per cent). 
That is, the stability increases by about the same proportion (50 per 
cent) trom 16éw tOwieh, Thisfincrease in stability sas also .vound 


among the wrong answers for the Proverbs Test (ey Powell, 1968). 


The Prediction Value of the Experimental Test 

In addition to the scores and individual responses on the 
experimental test, two achievement scores were obtained for most 
examinees. There was some loss of data, making Group A have 125 members, 


and Group B have 120 members. 


The first achievement test (Achievement Test I) was given with 
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the experimental test as a subtest abs: a midterm examination. The 
scores used for predictive purposes do not contain the experimental 
test scores. The second achievement test (Achievement Test II) was ae 
final examination in the same course. The relationship among these 


tests is presented in Table 24, 


TABLE 24 


CORRELATIONS BETWEEN THE TESTS IN THIS STUDY 





Experimental Achievement Achievement 
Test Tesit= ii Test IL 

Experimental 

Test 1.000 
Achievement 

Vestal 0224 1010/0 
Achievement 

Test II al25 414 1.000 


As shown in Table 24 the two achievement tests were moderately 
correlated (r = .414). The relationship between the experimental and 
achievement tests was considerably less, particularly with respect to 
Achievement Test II. 

In order to establish the predictive validity of the 
Banc evene ior the experimental test, several comparisons were run 
using a step-wise multiple regression technique in all cases. Each 


achievement test was predicted separately for each of Group A and 
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Group B. The following predictions were made. 

1. Total-correct score on experimental test predicting the 
total-correct scores of the eae. ee 

z.. Right-answer subtest scores on the experimental tést with 
the subtests defined by the advance classification of items 
predicting the total-correct scores on the achievement tests. 

3. Combined right answer subtest scores and wrong answer 
subtest scores with the subtests defined by the advance 
classification, predicting the total-correct scores in the 
achievement tests. 

4, Scores on right-answer subtests defined by interpreted 
clusters used to predict the total-correct scores on the 
achievement tests. 

5. Scores on combined right-answer and wrong-answer subtests 
defined by all interpreted clusters used to predict the 
total-correct scores on the achievement tests. 

6. Scores on right-answer subtests defined by grouping right- 
answer clusters by the een used to predict the 
total-correct scores on the achievement tests. 

7. Combined right- and wrong-answer subtest scores defined by 
grouping of clusters on the basis of the foil hierarchy 


used to predict the total-correct scores on the achievement 


In each case the total-correct score of each achievement test 


was being predicted. 


Table 25 (see: p. 103) gives the results of these predictions 


for Group A. 







maiccecianin 
ana qmitsibety pees Pi eat Hs SPN a 


=F" “3+ ¢) is = 






a a ie ¢ 
a i 
a a 7 
= > 


= 7% aired: 2 - 
; cheat Jrramawetitem ait, 9 ayes plains <n 7 
s a | Sowee aa - coat Te : ih 


fee 

















nsw feu? letra lystep shy 0 weve tape ae iy se As 
"> = = - . + « -. - . se. ‘a7 


= $4 « - ie ~s oe ' . 


_ 2 . a i. ‘ io aoe - 
oo Wh achint eo) conevbs oft yl) feel ial staadtpe eae F ; 


an Ieesrscner ot0- 1c 2eNbeR Fosters cieen? sdf aaitpkieee ae 
f ee oa ; aa ; opr fl eee 
. 7 “ : o 
woot Peafdie revere fitoi: temkdeh Se + & 
- 7 + PY 9 
Tea : 7 p a? . re we a4 a) 20h) a ean 


ew il Mn? se, otmeitivs au? deh wetoor Ieekigk > oo] 1 ae 


3% - 5 xl wit {aecc 29> test i fesaio: : 32) eee - 
= * ~ . 4 . o 76° Be 
7 : i : a ae 
: : Es Tas? ThGheVeRiva iv 
= rg ; aa , 
(> Sei Ne tee 6s Sieetiy teyens-iihtt Oe Reeoe = =. oe 
A "a Ps R Se Por ey ae 
t = ~ wit ee oe fis by ry] at ae ae vastenta : : 
* ’ = 7 a 


: 5 : . ; : “a 
on se ree ne as +6233! . Teeheveisga UES 


24> . 
a = Prat es | 
. - 
= ; + * ER 4 mY 2 te -—rnose i tot Ee kd ; 
is 2 , = & oa 
. ay ee eo gee - « « * nal aes Aa te 
* Rap ’ a7 . r ¥ a —- “Fy 7 rs) Z a. 
- = ors A. = 7 “~K F = = 
S he ty 7 . oe os oe | - © . - o- 
6c = 


; » - ny. ry 
Site : Wo REA ST O SHON ll ae 
* ‘ ’ ; 
exis bya Bi )aeTeisl Lict aad el atetueclo teyees ad 


Ds ot an WE’ forrioe«~fLeto? 


307) ES BS 'UGO Yowat—Qietw Sor <fiig- benidmed «7s 





TABLE 25 


STEPWISHK REGRESSION OF GROUP A DATA USING 


SEVERAL 











! 
1 
! 
1 gat 2 3 isle 
1 
! 
. Achievement Test I Ro : (059 sive .e2o soo 
{ 
: R Be eee Game On. 
(Se ke 
t 
No. of Predictors available! L 14 6 
1 
No. of Predictors used? ! m 5 14 3 
Achievement Test II és ' +55), 6 5S aleeargl ViraS) di) aaa 
eo eo! Teo: ae 
a SE ES |) Ss hs eo se 
! 
No. of Predictors available! 2 5 14 6 
{ 
No. of Predictors used bb ui 9 None? 


Tota. ouMber Ol Frignt 


answers used 30 50 30 19 
SS a ee ee he ee 
1 
Total number of tolls used 0 0 70 0 
1 


COMBINATIONS OF VARIABLES: 


Combinations of Variables® 


5 6 7 
Ayaies) 2055 
525 234 
ais) 3 6 


18 None e 


ral Ve penn WS lary 9.85) 
lS) BAL eee 
18 ) 6 


a. The combinations of variables are defined by number as given on 


pee@e; 1 02% 


b. Only those predictors which made a significant contribution (p / 06) 


to the prediction were included on this table. 


Table 25 gives the correlation between the total-correct score 


on the experimental test, the multiple correlation achieved for signif- 


icant variables (p / .06). The squared multiple correlation (R*) are 


also given to indicate the amount of variance accounted for in the 


predictions. 


A similar set of data for:Group) B) follows an Table 26 


(see: p» 104). 
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TABLE 26 
STEPWISE REGRESSION OF GROUP B DATA. USING 


SEVERAL COMBINATIONS OF VARIABLES 
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Combinations of Variables® 
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' 1 L B Ly 5 6 i 
modo es06S 4190" 2035—5.189 aay O91 
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of Predictors available 


al D 14 6 18 3 6 


1 3 14 1 14. None 5 
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' 30 30 30 19 19 28 28 
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i 
Total number of foils used } Oe 0 70 0 16 0 146 
cee te lr a a ee ec a 
a. The variables used with Group B were identical in definition to 
those used in Group A. 
b. Only those predictors which made a significant contribution 


(p / .06) to the prediction were included in this table. 


There are two considerations relevant to Tables 25 and 26, 


First, the value of the procedure is partly determined by the amount of 


2 : . : : ; 
variance accounted for (as given by R_). Using this criterion, there is 


a consistent improvement in prediction when the scores on wrong-answer 


subtests are included in the analysis. 


When this was done, the wrong- 


answer variables, in general, accounted for more of the variance than 


the right-answer variables within the same solution. The interpreted 
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clusters give a better solution than ne advance classification. Both 
.of these were better than the grouping by the foil hierarchy. The 
poorest predictor was the total-correct score on the experimental test. 
| The second consideration is the statistical significance of the 
improvement of these values of Re when compared with each other. The 
formala Pomestiis comparison is; 
2 2 


eo) ll tie Sanpete hy 


ise a. 
1.00 Ry 


i 
i 


where N is) the number or persons, 
K is the number of independent predictions in Roa 

and L is the number of independent predictions in eee 

This procedure gives the usual "F" test with degrees of freedom, 
N- K- 1 and K - L respectively. The results of these calculations 
derived from the data in Tables 25 and 26 (see: pp. 103 and 104) are 
given in Tables 27 to 30 which follow (see: pp. 106 to 109). 

Table 51 gives the significance of the miiterence between the 
predictions of Test I for Group A. It shows that the sequence l vA oo 
3 i 4 ip 5 stands clearly in the diagonal. One factor which may be 
involved in the equality 2 = 3 may be the fact that the number of 
predictors increases from 5 to 14. If all other values remain the same , 
an increase in the size of the sample of less than 50 per cent would 
make the difference between combination 2 and 3 significant. Also, no 
Variables made 4 significant Contribution in combination 6, and 2 
variables were significant in combination 7, which makes 7 a signifi- 
cantly better predictor than 6. Thus, for Group A when predicting Test 


I there is a consistent tendency for the combined right and wrong 


answers to be better predictors than the right answers alone. 
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TABLE 27 


SIGNIFICANCE OF DIFFERENCES BETWEEN RS Ig 


FOR GROUP A WHEN PREDICTING TEST I 


2 ere , ; 
R for Variable Combinations? 











1 
Ry 
f ! ; 
or variable; 
combinations! . # 3 g 6 0 
‘ i b! : 1 ! ! 
+a Pies pit ae bi E pif z 
| acne at 4 ae 
! ! ‘empty ! 
Z } eel pr a 
7 1 7 ! ims ae 
: empty 
2 We —e Sit 15.44 sie s ble O2maO2 
| ! ! 7 ih ea 
empty Phe 
3 ir OG SOSH 6 --} : loell == 12.05.05 
: | | 3 en = 
empty : 
L a 12.07.05 (hae Giana ea cee 
: ! | | : | 
empty ie Ae ¢ 
5 = een y —— 205 
| | | " : | 
> ! ! ! ! pempty eek 
1 ! i} 1 'cell ! 


a. Definitions for these variable combinations are given on p. 102, 


be Only the probability (p level) for significant differences are shown. 


Table 28 gives the significance of the differences between the 





predictions of Test II for Group A (see: p. 107). 

There is no similar pattern when predictions are made to the 
future Tesv [Ll as compared with the concurrent Test 1. Some of the F 
values in the equivalent diagonal are large enough that a larger sample 
might make them significant. At least all predictor combinations are 
better than the total-correct score. Although the grouping of 


alternatives (combinations 6 and 7) on the basis of the hierarchy does 
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TABLE 28 
; D 
SIGNIFICANCE OF DIFFERENCES BETWEEN R™'S 


FOR GROUP A WHEN PREDICTING TEST II 


nn 
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Ry ! 
for variable! ae for Variable Combinations* 
combinations} 
ee 2 3 4 6 7 
log ! 1 ! ! ! 
LZ pik py E pF pik pe By 
cae et er a bet ee See be Se 
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1 ! ! i 1 ! 1 
se el ee ea ee ee Bie eee eee 
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a. Definitions for these variable combinations are given on page 102. 
b. Only the probability (p level) for significant differences are shown. 


ce. ~nmeans:™ Denominator OF, F value indeterminate. 


not yield significant differences, these two variables tend to have the 
largest F values. Once again, a larger sample size might have made the 
difference. The cross-validation of the multiple regression coefficients 
which follows this section casts further light on this aspect of the 
problem. 

Table 29 (see: p. 108) gives the significance of the differences 


between the predictions of Test I for Group B. 
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TABLE: 29 
SIGNIFICANCE OF DIFFERENCES BETWEEN Ro tg 


FOR GROUP B WHEN PREDICTING TEST I 
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a. Definitions for these variable combinations are given on page 102. 
Deny thes proba bea Laity (p level) for significant differences are shown. 


Cc. * means: Denominator 0, F value indeterminate. 


Table 29 is very similar to Table 28 except that none of the 
predictor combinations are significantly better than any other including 
the total-correct scores. These findings are not surprising considering 
the low level of cross-validation ready found for ail combinations 


except the grouping based upon the hierarchy. Although none of the 
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values for these groupings (Combinations 6 and 7) are significant, these 
“two combinations have the highest F values. Once again a larger sample 
Size might have made the difference. 

Table 30 gives the significance of differences between 


peedictions of Tést If for Group 8. 


TABLE 30 


SIGNIFICANCEHE OF DIFFERENCES BETWEEN Rtg 


FOR GROUP B WHEN PREDICTING TEST II 


R - for Variable Combinations® 


2 


! 
! 
for varreblex 
combinations! 4 3 + : 0 
1 1 1 1 1 i a. 
ee ia de p, = Pe Die pi P 
ee Nee Se ee ee ee ee ee ee ee ee 
' 1 ! ! ! 1 
1 I 
! i} i} (| ! ! 
Zz 11.49 --{ ' pee te minal --4 
! ! 1 { ! ! 
3 11044 ae eb) --{ HS 16, --j1.42 --| 
: ttt 
iis elleoor P=! : : : : 
1 ! ! ! { ! 
a Pee |) ons eal MERE wae Vs ee ee ee 
5 i150 =~ tle 52 --!'] 41 --11.56 ae fo --! .50 = 
if ! t 1 { ! 
! a 1 1 “ 1 ee a. 
6 le2 =-! ' Kees aio : 
! ! ! ! ! ! 
ee ee ee 
7 bio ERO? (anno bi we --'4,87 0114.76 ee 
! ! ! 





ao Definitions for these variable combinations are given on page 102. 
b. Only the probability (p level) for’significant differences are shown. 


: TI : . 3), 
c. * means: Denominator 0, F value indeterminate. 
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Table 30 shows Combination 7 to be a highly significantly better 
-predictor in four of the six cases: When comparing Combination 3 with 7, 
the Re is larger for Combination 7; but the number of significant 
predictors is larger for Combination 3 and, for this reason, gives a 
negative value’ in the formula. logically, “Combination 7 would; 
therefore, be the better predictor in this case as well. Combination 5 
has a somewhat larger Re but requires many more predictor variables to 
achieve this, making the difference clearly insignificant. 

Three trends seem to emerge from this data. First, combined 
right and wrong answers generally seem to yield the best predictions. 

Second, the best prediction of a concurrent test seems as be 
found by using the interpreted clusters for the same group. 

Third, when predicting remote events or the results of another 
group combining the predictor variables on the basis of the hierarchy 


would seem to give the best predictions. 


Cross-Validation of the multiple regression coefficients. It is 
possible to cross-validate multiple correlations by finding the vector 
product of the validity coefficients for one group and the standardized 
regression weights for the same variables from the other group as 
follows: RO =a WwW 

wonere V' 28.4 xow Vector of the validity coefiicients for 
one group, 
W is'a column vector of the standardized regression for the 


corresponding variables from the other group, 


ee P. . 
and R is the resulting vector product. 
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Since the combining of right and wrong answers seemed to give 
the best results, only these combinations of variables were cross- 
validated in this manner. Table 31 (see: op. 112) gives the results of 
these calculations. 

To begin with, Table 31 shows that Combination 3, the advance 
classification, does not survive cross-validation. Clearly, the best 
predictor of the concurrent test for Group A was the interpreted 
clusters (Combination 5). a finding consistent with the findings for the 
significance of differences. This combination (5) of variables did not 
cross-validate in any other situations. 

For the remote test, Combination 7 proved to cross-validate very 
well, much netto® than the significance of the differences would suggest. 
For Group B, Combination 7 cross-validated about as well as the propor- 
tion of item-for-item cross-validation would suggest that it should. 
These findings also tend to support the suggestion that grouping a the 
basis of the hierarchy may be the best method for predicting future 
performance or performance in another group. 

The reader is cautioned that the correlations between the 
total-correct scores of the experimental test and the achievement tests 
suggest that. these tests may be dissimilar in the characteristics they 
are measuring. This situation would be expected to produce lower 
multiple correlations than tests of greater similarity might achieve. 
Second, near zero multiple correlations are easier to cross-validate 
than higher ones, so that the relative stability of these correlations 


can only supply suggestive results by themselves. 
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summary of Chapter V 


follows; 


Briefly, the findings as reported in this chapter were as 


l. 


Interpoint distance gave the best statistical solution to 
the data being considered in this study. | 
Logico-semantic analysis provided reasonable support to the 
construct validity of the procedure used, provided that 
alternative classifications are permissible, and inter- 
pretations were confined to the group under study. 


There may be a hierarchy. of foils which parallels Bloom's 


‘Taxonomy which may influence the way in which items perform. 


None of the. cross-validations were very strong, with the 
hierarchy tending to be somewhat better supported than other 
aspects of the analysis. 

Wrong answers, in general, tended to add significently to 
the predictions of both concurrent and future achievement 
whenever significance was found. 

Within one group the interpreted clusters gave the best 
concurrent prediction, otherwise the grouping on the basis 


of hierarchy seemed to produce the best prediction. 
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CHAPTER VI 


CONCLUSIONS AND IMPLICATIONS 

The conclusions which can be drawn from this study are discussed 
in three sections; first, the conclusions which are relevant to the 
experimental test used in this study, second, the conclusions which are 
relevant to the analytical procedure, and finally, those which are 
relevant to the systematic response postulate. 

The implications which can be drawn from this study are iI 
in four sections. First, there are the implications of the results of 
the use of this analytic procedure to the theory of test analysis 
procedures. Second, there are the implications to the design, construc- 
tion, and interpretation of taxonomic tests. Third, there are a number 
OimiMplsestions Of this study to educational practice.--finaliy, this 
study has opened enough possibilities for future research that these 


are discussed in a separate section. 


Conclusions Related to the Experimental Test 

Superficially, the experimental test used in this study would 
seem to have been a weak instrument but, as will be seen, the criteria 
usually used for evaluation may not have been applicable to this test. 
Using the usual criteria, for instance, the selection ratio for the 
right answers on most of the items was low; the biserial correlations 
of the items to the total correct scores were also low relative to the 
size of the corresponding difficulty ratios; the internal consistency 
based upon the Kuder-Richardson procedure was low, and the interrater 


reliabilities left a great deal to be desired. 


On the other hand, the usual criteria employed for the 
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evaluating of tests may not be appropriate for this one... Briefly, the 
desirability of middle difficulty items is a criterion based upon the 
assumption that this level of difficulty maximizes the discrimination 
of the test when all the items are dichotomous variables. When all 
altematives are being considered rather than when the item is being 
considered either right or wrong, this criterion seems no longer to 
apply. 

The biserial correlation of the test items taken against the 
total correct SCOrmesmisma Crimerion ormthe. discriminating power of 
these items assuming that the test as a whole is highly homogeneous. 
The low internal consistency of this test suggests that the test — 
not homogeneous. Evidence for the lack of homogeneity in this test can 
be found in the logic of the construct model (Bloom's Taxonomy), and in 
the fact that. the test subdivides inesthe cluster analysis into ten 
clusters of right answers, at least six of which were nearly orthogonal. 
Also, the loadings of the items on these nearly orthogonal factors were 
Very mearly the original lengths of the wectors’in the? principal axis 
matrix from which they were derived. This latter result suggests that 
the intemal consistency within clusters was substantially higher than 
within the test as a whole. For these reasons, the usual criteria may 
not apply toythis vtest. 

On the positive side, the average total correct score for 
persons selecting each alternative was higher for the right alternative 
than forvall fothers used by tat dleast i. points in 37Mouwt lol Seecases 
(or 64 per cent) which gives some support to the strength of the test 
(see : Table 40, p. 150). Although the clusters did not cross-validate 


very well, when the shifting of items into other clusters was considered, 
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. it became evident that the items and foils were much more stable than 
would be suggested by chance alone. Also, the clear evidence for an 
interacting hierarchy of items and foils would seem to provide strong 
support for the possibility that the instrument was measuring some 
systematic characteristics of the examinees. 

Precisely what these characteristics were would seem to be more 
ambiguous than the probability that they were being measured. They were, 
however, clearly process-oriented characteristics. The doubt about > 
precise definitions arose from two sources, 1) the lower than desirable 
interrater reliabilities, and 2) the lower than desirable cross- 
validation of the clusters. Both these weaknesses in the perro tad laa, 
of the results, however, may be a property of this type of test, and not 
aS Crecvolem Of 106 

If the proportion of the total variance used in the factor 
solution is considered, there was 37.7 per cent accounted for by the six 
factors for the items and another 48.6 per cent for the 15 factors used 
for the wrong answers, suggesting a higher internal consistency than the 
Kuder-Richardson results suggested. 

Thus, the instrument displays the following properties: 

te Significantly improved prediction of independent concurrent 

achievement scores for the same group of examinees using 
interpreted clusters from both right and wrong answer clus- 
ters combined over the other combinations of scores tested. 

2, A clear hierarchical pattern of both items and foils. The 

right answers and the foil selections seem to interact and 
to be relatively stable within the hierarchy under cross- 


validation when the range of shifts are considered. 
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3. An overall discrimination apparently based upon process- 

oriented events rather than content-oriented events. 

The construct objective was to produce a process-oriented 
taxonomic test which had predictive value relevant to achievement 
variables. Whatever criticisms might be made of the test, the results 
made it clear that it met this construct objective to a reasonable 
degree, and for this reason be taken to be a valid test. Also, the 
indirect evidence suggested that the test was probably more reliable 
than was suggested by the direct evidence. Precisely which procedures 
should be used to establish validity and reliability estimates for tests 


of this type are not yet clear. 


Conclusions Related to the Analytic Procedures Used 

The analysis began with phi coefficients. The use of these 
coefficients can be defended on the grounds that none of the assumptions 
which are made for these coefficients was violated by their use. The 
two variables being related for each coefficient are discrete since each 
represents the selections made by the members of the same group for dif- 
ferent alternatives. They were dichotomous because an accept-reject 
decision applies to all altematives as a requirement of the response 


procedure. Linear dependencies were removed from the data by parti- 


tioning the matrix. Finally, since all values were expressed as 





frequencies of occurrence, the categories were amenable to appropriate 
representation by two-point values. 

The resulting large matrices of phi coefficients were simplified 
by principal axis factor.analysis in order to remove as much measurement | 
error as possible, and to maximize the variance accounted for by any 


particular number of dimensions in the space being used. Beyond this 
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point the procedures seemed to ere into two aspécts, those which 
are commonly used for the study of tests and their results, and those 
which are not commonly used but which are specifically selected to meet 
problems which may arise in the interpretation of the results. 

The result of this study derived from the commonly employed 
procedures was uniformly ambiguous, inconclusive, or negative, whereas 
the less common procedures uniformly produced significant results. As 
indicated, the test itself would seem to have been of questionable 
value if it were evaiuated by the commonly used procedures, and yet it 
ms ehiay met the construct properties it was designed to meet. 

The=Procrustes rotations of the*factor matrix to Tit either 
content or process in the advance classification produced negative 
results. The results of the usual analytic rotations on the factor 
Matrices were not reported in Chapter V because they were as uninter- 
pretable as the Procrustes rotations. However, when interpoint distance 
clusters were used jto avoid the problem of rotation, a set of nearly 
orthogonal groupings of the variables was produced... Unquestionably, the 
Pater approach produced a statistically satisfactory representation of 
the data leaving the researcher with the problem of interpretation to be 
resolved by non-statistical methods. 

Cross-validation of clusters was equally disappointing, and 
contradicted the results of the t-test for uncorrelated samples based 
upon total correct scores. Substantial improvements in the proportion 
of cross-validation were found when the apparent hierarchy of items and 
foils was taken into account. Additional support for the cross- 
validation was found in the pattem of shifts of alteratives which 
occurred among the clusters between the two groups. The data were much 


more systematic among the clusters between the two groups. The data 
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more systematic than the usual procedures seemed to suggest. 


Using total-—correct scores to establish a hierarchy of foils on 
both a within item and a between item basis produced results with a 
moderate relationship. When these two procedures were used to order 
the right answer clusters, however, a highly significant similarity 
between these latter two orderings was found, leaving little doubt that 
i interactive ordering between the right and wrong answers was a 
systematic characteristic Of the data. This ordering was unrelated -to 
the total correct averages for the right answer clusters and to item 
fap tvenl ty, making the use of total-correct scores for the establishment 
of the hierarchy questionable. However, the shift pattems of the cross- 
validation suggested a much more stable result than did the cross- 
validation alone, suggesting that these shift patterns might be used to 
determine the ordering of the clusters instead of using total-correct 
scores. 

Although aljl predictions ee low, the use of the interpreted 
clusters did tend to give significantly better concurrent prediction of 
the total-correct scores on the independent concurrent achievement 
measure for the group on whom the interpretation was attempted. The 
amount of variance accounted for increased roughly three times in this 
case, The question of the validity of the use of the total-correct 
scores as adequate representations of achievement on the achievement 
measures was not explored. Finally, the hierarchy also proved to be 
more broadly stable during cross-validation than other variables. 

Evidently, the more conclusive results were found by the less 
common procedures. Since these procedures were designed to fit the 


specific problems raised in this study, and since the statistical 
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adequacy of these procedures proved to be beyond question, they would 
Seem to be, collectively, a more adequate method of approaching the kind 
of data this study produced than the more common procedures would seem 


to be. 





The adequacy of the experimental test, and the analytic 
procedures used for the purpose of this study seem to be established to 
a reasonably acceptable level. Taken alone, the data were sufficiently 
systematic in both right and wrong answer matrices to establish that 
"most if not all of the answers given to multiple choice achievement 
tests are selected upon a pene tate basis" may be a reasonable approach 
to human performance. The two findings most relevant were the presence 
of the interactive hierarchy and the increase in predictive validity 
evident when wrong answer clusters were included in the regression 
equations. 

If the evidence supporting the multiple interpretation hypothesis 
is included, the support for this psychological postulate is greatly 
increased. To begin with the negative results from the Procrustes 
rotations for the advance classification by.both content and. process 
established the inadequacy of this approach, lLogico-semantic analysis 
clearly established that process variables provided the best interpreta- 
tion of the clusters, But the failure of the Procrustes rotation with 
the advance classification, the low cross-validation levels and the 
shift patterns clearly indicated that the classifications are not 
mutually exclusive. 


Kropp et al (1966) found that the same subtests were differently 
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defined by the Kit, for different grade levels, and Powell (1968) 


reported only about 60 per cent cross-validation by reported reasons 
for the selection of particular wrong answers. Also, the higher level 
alternatives tend to be more stable than the lower ones when taken in 
combination, which is consistent with other findings (see: Powell, 1968). 
Thus, independent studies also report findings suggesting that the 
ee teat auc of answers may not be mutually exclusive. A reasonable 
synthesis of these findings would be to suggest that each item may be 
interpreted in a variety of different ways. That is, the poor showing 
on cross-validation and the lower than desirable interrater reliabilities 
may be a product of multiple interpretations of the items, Strong 
support for this hypothesis was found in this study in the systematic 
character of many of the shifts which occurred among the altematives. 
These shifts were sufficiently small in range for most items that the 
possibility of their use in the establishment of order among the 
variables could be proposed (see: Dp. 97). Further support was found 
in the fact that the prediction of the concurrent er on the inter- 
preted group was the only case where the interpreted clusters had a 
distinct advantage, Predictions based upon the hierarchy, on the other 
hand, seemed to be less powerful, but more stable over a broader range 
of time and population. These findings support the multiple interpreta- 
tion hypothesis as well because they suggest the short range . 
applicability of specific interpretations. 

Perhaps the range of this applicability of interpreted findings 
could be increased if the heterogeneity of the examinees upon whom the 
interpretations are made were reduced. 


In summary, the conclusions of this study were: 
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1. Human performance, when eee from responses to 
multiple choice achievement tests involving higher mental 
processes, would seem to be systematic, and to display 
evidence of multiple interpretation of the communication. 

20.,There nie seem to be a hierarchy of foils which parallels 
the hierarchy of right answers and which influences the way 
in which each total item performs, The levels of the foils 
themselves seem to depend upon the systematic ways in which 


this totality of each item is approached. 


Ww 
° 


Wrong answers contain potentially useful information with 
respect to achievement when higher mental processes are 
involved. 

Before the implications of this study are discussed, a statement 
should be made concerning the limitations to generalizability apparent 


aM vats Study. 


Limitations to Generalizability 
There are several restrictions to the generalizability of the 
findings of this study which can be derived from the nature of the study 
and its conclusions. First, the findings of this study do not apply to 
multiple choice achievement tests where the simple recall of informa- 
tion is the only characteristic being measured. The experimental test 
was process rather than content oriented and Knowledge level items 
were considered inappropriate to its format, hence this limitation. 
Second, the findings of this study do not apply where the cost 
of the additional effort required to obtain and interpret categorical 
information upon the examinees is greater than the cost of information 


loss, and possible misclassification attendant thereto, by using the 
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much simpler total-correct score method of evaluation. 

Third, these findings may not apply when the most competent are 
being screened from already competent individuals for some specific 
purpose. A strqnger statement in this respect cannot be made because 
later research nee show that wrong answers may supply valid information 
for the purpose in question. For instance, Irr (Irrelevancy) foils may 
identify the most creative individuals among the high performers. 

Pinally, if a researcher has a valid reason to evaluate the 
effectiveness of a single treatment given to a heterogeneous group by 
using a single ordinal dimension for the particular. group in question, 


the findings of this study clearly do not apply. 


Implications or This otudy to the™Theory of Test Analysis 


a ee ee 


There are several situations where the findings of this study 
are very important to test theory. The fact that the more common 
procedures tended to give ambiguous, inconclusive or negative results 
raises a number of pertinent questions. 


Classical test theory begins with the assumption that 


whe re X. is observed score of the ee Imdavidve lL, 


T. as his true score, and 
z 


E. is the measurement error 
a 


However, for multiple choice achievement tests, this observed 


score (X, ) itself is usually a composite entity obtained from the 


summation of single events as follows: 
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where _ is binary, being 1 if the ith 
‘yibigccevieal answered the item correctly, other- 
wise O, and 

n is the number of items in the nests. 

The issue starsat nevrenies -- 18 X. sufficiently homogeneous for 
‘eli sandivyaduals to justity the use of Spee LOSt, NeOrys (a Lied tb. 1S 
not, aS was evident in the present study, then an altemative approach 
to the data would seem to be needed, since the more common Genesee 
proved UnSsatisiactory in this study. 

Within the context of the present study, several considerations 
must be met by tt altermative approach. The phi coefficients used 
are extremely sensitive to the magnitudes of marginal proportions. For 
tis reason, if particular alternatives are selected for a different 
range O7 reasons among two samples of individuals, it would be expected 
that these altematives would migrate to new clusters for reasons of 
‘systematic differences between groups rather than for reasons of 
measurement error, Also, if the range of reasons within a group of 
examinees were too broad, the interpretation of clusters would be ex- 
pected to be difficult and possibly not applicable to specific individ- 
uals. That is, the first assumption made would be that different 
reasons for the selection of a particular alternative would be reflected 
by differences among overall pattems, These arguments suggest the need 
at the outset for a homogeneous group of examinees. The key to 
homogeneity in this study would seem to be the shifts in category which 
occurred upon cross-validation, Perhaps an homogeneous group should be 


defined in terms of minimizing the shifts which occur in the clustering 


of the alternatives for any, or all possible random assignments of the 
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group members to an arbitrary number of groups. In short, the 
‘procedure should probably begin by selecting groups of individuals with 
maximum cross-validity within these groups. 

The clusters for such groups will be as stable as possible on 
the basis of the determindition of their composition. Hence, the 
possibility of interpreting the resulting clusters should be optimized, 
as should the applicability of these interpretations to the individuals 
within the groups. 

With the clusters thus stabilized it should be possible to 
determine the clusters using all the data rather than a simplification 
of it, Since the surface-to-surface interpoint distance (d) EctRBER the 
ends of the vectors within the hypersphere can be meena quite 
simply by assuming that phi (d) is the cosine of the angle between the 
arms of the isosceles triangle produced by the vector pairs. In this 


case the distance (a) is 


A tighter level of homogeneity is possible among individuals 
who have essentially the same response pattems but these individuals 
would tend to have only one meaningful cluster composed of all or most 
of their responses thus rendering interpretation impossible. 

The dimensionality of the data from a homogeneous (by cross- 
validation) group of individuals would be determined by the minimum 
number of homogeneous clusters which can be extracted before orthogonal - 
ity between the centriods of the clusters begins to disappear. Thus, 
the proposed procedure as just outlined as related to the problems 
raised by the data in this study provides a unique solution once the 


stop criteria are established. Each cluster would be categorical 
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(in the sense given on page 47), and optimally interpretable assuming 

that differences in interpretation of the commumication by the examinee 
are characterized by differences in selection pattem. 

There may, as the evidence from this study suggests, be an order 
among the categories which can be determined by the shifts which een 
during Cross Validaion. -coOr items would be wnstable for the cross- 
validation. 

So far as the categories themselves are concerned, these would 
be expected to be of two types 1) nominal, and 2) ordinal, Nominal 


categories would be expected to be bimodal with the modes tending to 





polarize at the extremities of the potential range within the Paeecouy 
Ordinal categories would be expected to display scalability 
characteristics within their potential range. 

Relationships among categories other than the ordinal 
(hierarchical ) one could probably be determinable by the relationships 
among the centroids of the categories. For instance, Powell and 
TIsbister (1969) found a polarity between Invalid Assumptions for the 
wrong answers and Synthesis items among the right answers. In such a 
case it might be unnecessary to partition the matrix to remove linear 
dependencies. if thas latter facilitation could be provided by this 
procedure, a homogeneous test in the classical sense would be one in 
which the right answers formed a single cluster of the ordinal type. 
Thus, the proposed procedure just given would seem to contain the 
characteristics of classical test theory as a Special case. 

It would be reasonable, then, to argue that the findings of 
this study suggest the need for alternative procedures to the ones in 


common use for the analysis of data from multiple choice tests, and the 
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findings suggest a particular procedure which contains the commonly used 


procedures as a special case. 


Implications of This Study Concerning Taxonomic Tests 

Peto aC Lassa 1a tion (Sve ten, 

ze It is hierarchically ordered on a "complexity" dimension. 

ips Kach higher level is formed by Chi BAITS of the lower 

levels, 

The two properties of classification and ordering among classes 
combine to distinguish a taxonomy from other classification systems. 
Thus the evidence from this study supports the description of both 
Bloom's Taxonomy and the Guidelines as taxonomies. Noting the evident 
interrelatedness of these two taxonomies in this study suggests that 
they may both be part of a single taxonomy. 

Conceming the “complexity" dimension, Bloom said, "Our atone 
to arrange educational behavior from simple to complex was based upon 
the idea that a particular simple behavior may become integrated with 
other equally simple behaviors to form a more complex behavior [page Lope 

The findings Of this study which produced no relationship between 
total-correct scores and the hierarchy where right answers were 
concermed, and no relationship between average item difficulty within 
clusters and the hierarchy did not help to identify the meaning of the 
term "complexity." Since the hierarchy could apparently have been 
produced through cross-validation procedures without recourse to total- 
correct scores, the meaning of "complexity" becomes even more vague. 
However, the finding that "higher" level members of the taxonomy tend 


to be more stable than lower lewels suggests that these categories may 
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be the product of "more powerful strategies." 

The third aspect of Bloom's Taxonomy as noted previously 
(see: De wa also deserves attention. This aspect of his definition 
would seem to arise as an hypothesis from his definition, the 
"complexity" dimension, About this subsumptive property of the 
Taxonomy Bloom himself said that the evidence he could collect to 
support this property was inconclusive (Bloom: p. a) 

Other evidence concerning this subsumptive property is meagre. 
Kropp et al (1966) did not find the clear reproduction of the pattem 
whack uney Expected to find in “the factor analysis of their tests 
[Kropp: We JL ety Also, their Simplex analysis did not produce the 
consistent order that this property would predict (see: pp. 24-25). 

Powell and Isbister (1969) round inet 4 Promax Ovals om cid nou 
improve the resolution of ee, te between subtest scores based upon the 
advance classification-for essentially the same test as used in this 
study. The subtests as defined in this study by cluster analysis 
rather than by advance classification showed this same tendency to 
orthogonality. It is premature to be dogmatic, but it is possible that 
this subsumptive property may be a hypothesis which will be refuted by 
the ees Altermmative analytical procedures such as the one 
outlined above may be needed to settle this issue conclusively. 

There are altermative theoretical positions which would predict 
the possibility that strategic categories may be discrete and hierar- 
chically ordered by "power" rather than subsumptive. Piaget /1963, 

Pest > ft/, for instance, has suggested -that development may involve 


shifts in the one in which case development may be expected to 


(alternatively, "the acqwisition of new strategies," 


[see: Powell, 1967, pe 286 ee 
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proceed in a series of discrete Ne and stages, each of which would 
be expected to have its own distinctive aac Such evidence as 
is available, ian sparticular the difficulty in determining.a data-based 
definition of "complexity" as just discussed; the apparent tendency for 
"higher" clusters to be more stable than "lower" clusters; and the 
broader cross-validation support for the hierarchy than for specific 
interpretations add suggestive support to the latter alternative over 
the former. 

Thus, the advent..of taxonomic achievement tests raise some 
issues in connection with the analytic procedures and the interpretive 
procedures used for these tests. Whatever else, the results of this 
study have clearly shown that these tests produce a genuine taxonomy 
which might be improved by the systematic development of foils, and the 
use of the responses to these foils as information when evaluating 
and interpreting these tests and when evaluating, interpreting, and 


predicting the performance of the individuals taking them. 


The findings of this study suggest that tests which are clearly 
homogeneous regarding internal consistency may form a special case of a 
broader class of tests which have taxonomic properties, This conclusion 
has broad implications with respect to their use in the educational 
setting. 

To begin with, the use of the Guidelines would seem to have 
several practical advantages. First, they simplify item writing because 
they provide a systematic basis for writing a broader range of foils 
than can be made without them. Second, the Guidelines improve the basis 


for the reasons why a foil is wrong. Third, as research further extends 
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the range of Guideline categories and refines their definition, it may 
‘be possible to increase the precision with which such concepts as 
analysis may be defined, further improving the construct validity of 
process-oriented taxonomic tests, 

Another advantage may arise from the extension of the phidstenes 
into the Misreading and Misclassification types of foil. Such an 
extension may link what is now known about diagnostic characteristics 
of tests in the areas of content-related performance and skill-related 
performance, This linkage may make it possible to extend the diagnos- 
tic aspect of testing beyond the knowledge and comprehension 
characteristics of of reading and arithmetic into the more fates 
characteristics of mathematics and into the more esoteric subjects such 
as social science, and perhaps even literary appreciation where the 
subject matter is clearly oen to multiple interpretations. 

If diagnostic testing can be coupled through research with 
improved definitions of educational objectives and the factors involved 
in their attainment, teaching could be more nearly like the practice of 
medicine. In medicine the practioner classifies a set of characteris-— 
tics (4 syndrome ) and uses his knowledge of the effective treatments 
available ty remedy the condition. He then monitors the progress of 
Hieetreaticns. lt all goes normally, the condation 18 corrected. 11 
not, the practioner modifies treatment (prognosis) and/or orders 
further tests to modify the classification (diagnosis) of the condition, 
and if necessary, calls in: specialists to extend his knowledge 
resources, moves the patient to the hospital to extend his physical 
resources, etco | 


There are, of course, dissimilarities between medical and 
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educational practice. Medical men deal generally with short term 
problems of a clinical dysfunctional nature. The treatment range at 
their disposal tends to be drastic and when sel Sloat dramatic in 
its effectiveness, Educators, on the other hand, generally deal with 
long term developmental situations. The procedures available are less” 
spectacular, slower acting, and much more complex, However, learning 
research 1s now providing increasingly powerful tools for the educator, 
Among these are CMI (Computer Monitored Instruction), and CAI 
(Computer Assisted Instruction). These two procedures alone, along 
with the saxeseuaeeaeee Sek interpretation of right and wrong answers in the 
terms just indicated might greatly extend the capabilities of ABA, 
The essential problem with the bright picture just painted is 
that at the moment, it contains too many unanswered "ifs." The next 
section spells out some of the research which might be conducted to 


help to make this dream become a reality. 


a 


There are several areas for further research suggested by this 
study. One of these involves the host of problems which an extension 
of test theory could generate. The solution to the "multiple 
interpretation" problem presented in this study, although suggested and 
strongly supported by the findings of this study, is probably only one 
of a range of possible solutions some of which may be more practical 
than others. Those individuals interested in mathematical statistics 
could poses many avenues from this single problem. At a more practical 
level, there are a host of numerical analyses problems in the 


implementation of the particular procedure proposed in this study. 
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Subsequent to the effective Agneta Of san eotiective 
analytical procedure, there is the possibility of a host of studies 
into the characteristics of specific tests and classes of test, into 
the conditions under which nominal and ordinal categories form, into 
the types of relationship among categories which are normally found 
and the conditions under which these relationships occur. Second, 
order factoring of the centroid matrices seems a logical first step 
but perhaps the entire structure could be integrated into a single 
analytic procedure and a single model. 

Another essential area for research involves the formulation 
and resolution of problems arising from the interpretation of clusters 
after their statistical characteristics have been determined, Attend- 
ant to this problem is the cross-validation of interpretation to 
independent samples of equivalent or nearly equivalent profile 
Pie@ereverus tues. SA protile in this econtextsiS a4 set of clusters and 
U.S ab bendant Statyustical and Pier oere characteris tice, 

The formation of a generic model for a range of type of test 
opens the possibility for the computer generation of a test of partic- 
ular construct characteristics derived from the past performance 
characteristics of a large number of items in a pool of items, With an 
even larger pool of items, the computer could generate and administer 
a branching type programme tailored to a wide range of individual 
differences with the aid of researchable adjustments to the test 
construction model. In this latter case, the computer could update its 
performance statistics on the item pool as students take the course, 
and in so doing refine its own course. 


Another area for research could be the reworking of many of the 
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studies related to educational methodology, evaluating the methods with 
. the profile analysis procedure suggested here. Problems of matching 
teacher, method, student, and programme could be opened to detailed 
research, paving the way to the much broader use of diagnostic- 
prognostic practices in education than their present use, Studies ie 
the relationships between achievement, personality, intellective and 
perhaps even genetic disposition variables whould also seem reasonably 
possible from these small beginnings. 

Another area for research could be the precise definition of 
the developmental sequences through which children pass, the optimum 
ways of modifying these sequences toward specific goals and the degree 
to which these sequences can be modified. The charting of developmental 
patterns might lead to earlier and more precise identifications of 
Speciaic talents. Aso, the extension of the Guidelines to include the 
full range of academic performance might help to answer questions about 
the relative importance of content and of process in particular subject 
areas and for specific stages of development, 

Winaliy, there 16 the psychological question as to whether 
intellectual development is continuous and cumulative, or discrete and 
taxonomic, or some combination of these two. In this latter case which 

. 
aspects of intellectual development are continuous and which are 
diecrete and how do they interact?) Can critical phases and critical 
experiences be identified and matched so as to extend human 
Capabilities? 
This list 1s not exhaustave, It as deft to the reader to 


extend it himself in keeping with his own special interests. 
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TABLES 34, 35 & 36 
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‘PHI COBRFFICIENTS FOR THE WRONG ANSWERS 
FOR GROUP A 
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APPENDIX B | 
THE ADVANCE CLASSIFICATION OF THE ALTERNATIVES 
IN THE EXPERIMENTAL TEST 


The ae cneaion which follows presents a detailed item-by-item 
account of the procedure used in the construction of this experimental 
test. The format of this discussion involves: 

1. Giving the reading selection as it is required. 

2. Giving each item in its entirety in the format it was given 
to the examinees; except that in this discussion the cat- 
egories of the items and the foils are indicated for the 
convenience of the reader. 

3. The reason for classifying the item by Bloom's Taxonomy, 
and the foils by the Guidelines as indicated, are given 
following each item. 

4.6 CL tems 19 to 24 Se teiore form a special case a Wak, 

therefore,—be dealt with as a unit. 
5.) im tne classification of foils no one item contained, by 


arbitrary practice, two foils from the same category. 


THE EXPERIMENTAL TEST 
Directions for Examinees: 
Answer all questions in Part-I on the basis of the reading 
selections given. | 
First Reading Selection 


Source: Dexter, Lewis Anthony; The Tyranny of Schooling, 
Novag pasine Books) 1964 ep.7 ik: 


Most people in our society at one time or another suffer 
humiliation, shame, or at least severe apprehension because of 
one great fear: they are afraid that other people may think 
that they are stupid. This fear of being regarded as Stupid 
frequently underlies inferiority complexes, self-contempt, 
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Lay 
self-depreciation, and despair. 

Our society teaches contempt for stupidity and fear of being 
regarded as stupid through one central institution and its 
auxiliaries. its Ist bution is compulsory schooling. “It is 
aided by such auxiliary practices as compulsory written 
examinations for admission to many jobs, intelligence testing, 
and the like. 
1. From the above article we may conclude that if society does 

not reduce its contempt for stupidity: 


(Bloom's Category 4.20) 


A. Emotional problems will continue to be on the increase. 
(0G) 


*B. The development of creativity will continue to be 
restricted. 


Cay Mutual co-overation will continue to be difficult to 
obtain. (IA) 


D. Economic power will continue to be contined to a 
minority group. (Irr) 


This item was classified as 4n analysis (4.20) item on the 
grounds that it requires the examinee to display "skills in comprehend- 
ing the interrelationship among ideas." (Bloom: p. 206). The examinee 
is expected to realize tna, Contrary to popular myth, creative people 
do not display “inferiority complexes, self-contempt, self-depreciation, 
and despair" to the same extent as is found in the population at large. 
For this reason, the development of creativity and the development of 
contempt for stupidity would be expected to be inversely related. 
Since, in the stem, the variable “contempt for stupidity" does not 
change, it follows logically that the status of any related variable 
should show no change. If the examinee did not know this relationship, 
he should have been able to arrive at it from the logic of the foils. 


Foil 1A (or 1D,, on the basis of the symbolism used in 
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6 


Chapter Vv) suggests an increase in one variable without a corresponding 
increase in the other. The phrase "does not reduce" in the sae does 
not validly warrant the conclusion that contempt for stupidity will 
increase. It is adding incorrect information to the answer to suggest 

a change in one variable without an explicit statement concerning change 
in the appropriate direction in the other variable. Hence the foil is 
classified as an Over Generalization (0G). 

Foal LC. (or 1D,) assumes a functional link between co-operative- 
ness and contempt for stupidity. Unlike the case for creativity, there 
is no valid reason to assume such a relationship for co-operativeness. 
Hence this foil involves an Invalid Assumption (IA). 


Foil 1D (or 1D,) assumes a functional relationship between 


3) 
contempt for stupidity and the confining of economic power to a minority 
group. For this reason this foil could have been an Invalid Assumption 
(IA) except for the arbitrary rule used for elassiiyane sols which 
allows only one foil in a category per item. On the other hand, the 
concentration of economic power for the purpose of maintaining economic 
institutions is a practical necessity independent of how the society 
treats the individual or how the decision makers are chosen so that this 
statement is true but irrelevant to the problem. Hence this Pool was 
classified as an Irrelevancy (Irr). 


2. Which of the following is the most important causitive 
factor of contempt for stupidity? (Bloom's Category 4,10) 


Sie this code used in Chapter V the subscript stands for the 
first distractor (foil) in item 1. The code gives a standard 
procedure for identifying foils without concern for which 
alternative is the right answer. 
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*A. Compulsory categorizing in school. 
B. Compulsory school attendance. (SUB) 
C. Compulsory written examinations. (0S) 
D. Compulsory intelligence testing. (Irr) 

Gen 21s asking for the "causative factor" which is not stated 
in the selection. This item, therefore, requires the examinee to 
demonstrate his "ability to eSiemaee unstated assumptions" (Bloom: 
1956," p.205)', hence the classification of this item as analysis (4.10). 

Compulsory school attendance is an enabling factor in this 
Situation, but it is neither necessory nor sufficient. In fact, compul- 
sory school attendance is disjunctively related to the ne ee of 
contempt for stupidity which can develop in universities where attend- 


ance is not compulsory. Also, contempt for stupidity need not develop 


Fy 


in a compulsory school system. Any attempts at the eine ora loeation © 
an individual can lead to the development, on the part of the individual, 
of contempt for the forms of behavior considered "stupid" by that 
institution. (It can also have the opposite effect). In any event, the 
replacement of this term "any" by the term "compulsory" makes this a 
Substitution (Sub) foil. 

With respect to foil 2C (2D,,) the. cause of contempt for 
siupadity 1s the “pass-fail syndrome” which compulsorily classifies a 
Gertain proportion of the populetion as “stupid.” Imthis case, 10 as 
not the written examinations but the use to which they are put which 
leads to contempt for stupidity. Furthermore, contempt for stupidity 
can (and does) develop in the classroom context at times other than 
during examination writing by the tacit acceptance by a student's peers 


of the assumption made by the teacher that making a mistake is "sinful." 
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There would be no need for compulsory examinations if there were no 


attempt to make classifications. However, written nations by 
themselves do not cause contempt for stupidity, hence this foil is an 
Oversimplification (0S). 

In the case of compulsory intelligence testing (Foil 2D or 2D.,) 
the issue is whether or not the results of these tests are used as part 
of the compulsory classification system rather than whether or not the 
tests are given. Therefore, this foil as stated is an Irrelevancy (Irr). 


3. The school acts as an agent for the continuance of contempt 
for stupidity by: (Bloom's Category 2.10) 


A. Placing too much emphasis on success in extra-curricular 
activities. (Sub) 


*B. Reflecting the attitude that personal worth is at stake. 


C. Hncouraging competition between students of unequal 
on lexiyeee (Lard) 


D. Stressing knowledge as the only means to success. (0S) 

Dexter's approach to the schools is essentially upon an 
emotional level. In attributing a person's inferiority complex to the 
school his implication is that the basic strength of contempt for 
stupidity ls in its reflection upon self-esteem. This item tests the 
examinee's "ability to understand nonliteral statements (exaggeration)" 
(Bloom: p. 204). The classification is 2.10. 

foi) oA (3D, ) the phrase "in extracurricular activities" 
would have to be replaced by the phrase "in academic pursuits to the 
exclusion of success in self-corrective activities," to be correct. 
This foil contains a Substitution (Sub). 

Competition between students of unequal ability can lead to the . 
continuance of contempt for stupidity provided that the purpose of the 


competition is to make the less able appear stupid. It is the 
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Cujective and NMOWwe tact that 2s critical. The fact itself ‘is 


irrelevant; therefore, this foil is an Irrelevancy (Irr). 

In foul ou) (3D3) the critical aspect of this statement is "to 
the exclusion of success in self-corrective activities." Hence this 
foil is an Oversimplification (0S). Notice that foils 3A (3D, ) and 
3D (3D,) are both related to a "correct" answer which is not given in 
this item. It would seem perfectly legitimate when more than one right 
answer is possible for a particular item to use alternative right 
answers for the generation of foils. 

4, The author, in charging that "society teaches contempt for 

stupidity and a fear of being regarded as stupid" by means 
of the school, is assuming that: (Bloom's Category 4,20) 


A. 6 6 TMerechoo® should uot be an enforcing am, for the 
Customs of society. (0G) 


*B. The school is @ more powerful socializing force than 
the home. 


C. The home is a more powerful socializing force than the 
school. (Inv) 


. wine school 16 an "enioreings arm ol, the customs of 
Society. (irr) 


This item asks the examinees to "recognize a hidden assumption" 
(Bloom: p. 206) which makes this an analysis (4.20) item. 

EE the school ig at fault it must have more influence on the 
child than the home has on the child. Foil 4C Oy must be an 
Inversion (Inv) by virtue of being opposite to the correct answer. 

Foils 4a (4D, ) and 4D (4D5) are related since both contain the 
same irrelevant premise. However, 4A (4D, ) also contains the additional 
unwarranted value judgment "should not be." By virtue of the rule which 
excludes category repetition, it becomes reasonable, at least superfi- 


cially, to classify 4D (4D3) as an Irrelevancy (Irr) and 4A (4D, ) as 
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an Overgeneralization (0G). In the case of 4A (4D, )s however, this 
Overgeneralization is an unreasonable extension of a statement which, in 
itself, is incorrect, suggesting that second thoughts might have led to 
a more reasonable Be eo of this (621, 26 gthe wresulte of 


subsequent analysis showed. 


second Reading Selection 


source: Marris, Peter;fhe.bxperence sof Higher Education, 
London, Routledge Kegan Paul, 1964, p. 175. 


In this sense, it does not matter what subject a student 
studies, since each is leading towards a generalized intellec- 
tual awareness. But the starting point is still important since 
a student has the greatest incentive to understand whatever 
relates most immediately to his interests. Nor are the concepts 
derived from any one field of study equally relevant to any 
others: the ramification of insights remains biased by its 
roots. The intellectual content has to both guide and be guided 
by the purposes for which a student seeks understanding. 
Otherwise it is meaningless. 


If, then, higher education aims to teach students how to 
abstract, from a particular context, principles by which they 
can organize the perception of their universe of thought, it 
requires that these students have a use for such free-ranging 
understanding. When they enter higher education, their aims 
are confused, and they may not see, or wish to see, the value 
of a generalized intellectual skill. Their approach to learning 
has been conditioned by extraneous motives: they worked to win 
approval or avoid blame, to pass an examination, as much as or 
more than for the sake of understanding. They are not used to 
asking themselves what they want to understand, or why, but 
derive enough interest to master the skills required of them 
from a desire to satisfy the authority who sets the task. So, 
I think, the function of higher education is as mach to develop 
the autonomy of their desire to understand, as to satisfy it. 


5. The author suggests that a generalized intellectual 
awareness can be achieved by: (Bloom's Category 3.00) 


A. Focusing on progressively more difficult topics in a 
subject. (1A) 


B. Teaching the students how to generalize from specific 
content. (CM) 
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C. Presenting highly abstract material which is extensive 
in scope. (0S) 


*D. Presenting any subject matter in any predetermined 
sequence. 


This item was treated as an application (3.00) item because it 
requires the examinee to make "use of abstractions in particular and 
concrete situations." (Bloom: p. 205). For this reason this item was 
classified as application (3.00). The right answer is also, in fact, 
an Oversimplification (OS) because Marris says "the starting point.is 
important"' (see iO uo). However, the use of this phrase in 5D would 
have produced a "Clang association" which Thorndike and Hagen (1961) 
point out should not be used. (See p. 28). Alternative 5D, is, 
nonetheless, the most nearly correct of the four alternatives. 

epee} fod eGo Aor 5D, ) assumes that 1) some specific 
subjects are needed for the development of generalized intellectual 
awareness and that 2) instruction must, of necessity, begin with the 
Neasier'’ topics Tiyst. Both of ee assumptions are explicitly stated 
geesnvaliduin theswelections,.Thisetoil wasiclassifiedtas;an Invalid 
Assumption (IA). 

The relationship between generalized intellectual awareness and 
inductive reasoning as suggested in 5B (5D, ) is a very common over- 
simplification. With another oversimplification foil in the same 1tem 
the classification of this foil is a Common Misconception (CM) would 
seem to be quite reasonable. 

Similarly, the identification of generalized intellectual 
awareness with the transferability of content is an oversimplification 
of the topic of Marris' (1964) discourse. Therefore, foil 5C (5D) is 


classified as an Oversimplification (0S). 
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6. The purpose of developing a generalized intellectual 
awareness is to: (Bloom's Category 2.30) 


*A. Promote thinking ability which is not contextually 
bound. 


B. Enable an individual to master any subject area. (0G) 


C. Stimulate thinking ability within the individual's 
chosen field. (Sub) 


D. Give the individual an ever-widening view of his 
WOOL C sa (bier) 


In order to answer this question, the examinee is expected to 
make an "extension of trends or tendencies beyond the given data." 
(Bloom: p. 205). On this basis this item was classified as 
Comprehension (2.30). 

in Doble On (6D, ) the mastery of "any subject area" is far too 
broad a statement for the purpose of Marris' (1964) discussion. Hence 
this foil is an Overgeneralization. (0G) 

In the case of 6C (6D,) the phrase "within the individual's 
chosen field" is substituted for the phrase in the correct manner 
"which is not contextually bound" making this a Substitution (Sub) 
Tots. 

For foil 6D-(6D,) the absence of context in the right answer 
renders the Weltenshauung (World-view) aspect of this foil irrelevant, 
hence 6D (6D) was classified as an Irrelevancy (Irr). This foil could 
also be a Word-Word Link (WW) because of similarities in phrasiology 
between "ever widening view" and "free ranging understanding" (see p. 
Seam 


7. Of the following, the best example of generalized 
intellectual skill is: (Bloom's Category 3.00) 


A. Thinking within the confines of particular subject 
areas. (Sub) 
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B. Generalizing from the concrete to the abstract. (WwW) 
*C. he widely spplicable technique of logic. 
D. Applying abstract principle to new situations. (0S) 

The phrase "best example" in the stem led to the classification 
of this item as an Application (3.00) item. 

There are strong similarities between this item and the previous 
ae For instance, the “particular subject areas" phrase is similar to 
the “individual's chosen field," in item 6 so that foil 7A (7D, ) should 
also be classified as a Substitution (Sub). With respect to 7B (7D,) 
the use of induction is similar to 5B (5D, )- In this case, however, the 
Word-Word Link between "generalized" in the stem and "generalizing" in 
the foil is somewhat stronger because of the context than in item 5. 
Hence 7B es) was classified as a Word-Word Link. (Ww) 

Also, the confusion in equating transfer of training with 
generalized intellectual awareness found in 50 (5D3) reappears in 7D 
(7D3) which makes i% reasonable to Tecarde tice rOil asain (Over 


simplification (0S) as well. 


Third “Reading Selection 


source; Kagan J. and Moss, H. A.3 Birth to Maturity, 
The Ye, ewe Levey £962 op los. 


Aggression is a second behavior system that begins its growth 
during the first five years. Traditionally a response was 
labeled aggressive if the goal of the behavior was assumed to 
be peycholoei cal or physical’ injury to a person” or person 
surrogate. We have adhered to this definition. As with depend- 
ency the display of aggressive acts is a regular concomitant 
of development. The slapping or pushing of an age mate, the 
destruction of a sibling's new fort, and the stinging verbal 
attack are regularly observed in the behavior of many children. 


Aggression, like dependency, is subject to socialization 
pressures, for the child does not have complete license to 
unleash his anger when he chooses. In addition, as with 
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dependency, the occurrence of overt aggression is a function of 
both the threshold for motive arousal and the intensity of 
anxiety associated with direct expression of this behavior. 

In contrast to dependency, however, the potential for 
conflict over aggression is greater for females than for males. 

The pattern of social rewards and traditional sex-role standards 

act in concert to discourage the direct expression of aggression 

in girls and women. It might be anticipated, therefore, that 
aspects of aggression would be more stable for males than for 
females. This is precisely what occurred, for overt aggression 
to mother and frequent tantrums during childhood predicted adult 
ageressivity for men but not for women. 

8. If the school were to encourage tolerance for honest 

mistakes, we would expect aggression to: 
(Bloom's Category 4.10) 

A. Diminish somewhat. (IA) 
*B. Take different forms. 

C. Disappear completely. (0G) 

D. Remain unchanged. (Inv) 

Although this item makes a reference to the first reading 
selection (Stupidity), the question can be answered within its own 
context. For this/reason this item was not classified as synthesis but 
as analysis CLEL0) The logic of this item revolves around the assump- 
tion of the author that aggression is an innate characteristic of human 
beings which can be modified but not diminished. Thus 8A (8D, ) can only 
Deserves sa. this assumption is violated. Foil 8A (8D, ) would Seem to be 
an Invalid Assumption (IA) foil in the sense that the examinee must 
make an invalid assumption to select this answer as "correct." Foil 8C 
(8D, ) strongly overstates the same error as found in 8A (8D, ). For the 
same reason as 4A (4D, ) this foil was classified as Overgeneralization 
(OG). 


Changes in the psychological climate will lead to changes in the 


modes of expression of aggression which makes foil 8D (8D,) an Inversion 


, : : eS 
— of uo Si 
| | Dit. 


> 
i eyorge : ani a> * Sey ehary , he Yorebae geo 842 t 














ua 
4 ia 2 - 4 ‘ 
ia - 4 . “ > 
* a . a 
= 
of . s 
eS i 
? a - 
x * eb mae « f 4 4 23 + 
~~ é 2 ia ’ q 
< 45 ee. 
‘ 
)neqee : 
7 Tt Pm - = rf ’ lo An + 
4 7 .?- « Ate 1 
| : . ‘ 9 | of 
TS BD 
25 es 
* > be > 
J} . 
me 5 mt 
1 
is 
7 : ‘ . “ % r) 
/ 
bs rr * 
wet “wh + 
* ¥ * 
. 
« = > 
; - : ao 
* : z ~ » - a. * 4 + --+ a? * ve 
; av iR 
é F Ny 
i cae 4 
‘ : 
, j ae 
¢ a ‘ > ae c le 
. Sev. 
: i ‘ 2 Ber 
yy - 
_ P * . they” 
a _— +} 
‘ = ms erag r <= 
4 -_, Be 
_— 
Steer ~ ¢ > — a 


i 7 - igs - 


a9) «+ G8) fo aa itive!’ ge worm wise ot eeinteievo _fanorte A 


Lary" r ’ ’ + J j ry a 4¢ ag ; 
B Ti E | wd ? poitqmuece bifevnt aa alee 






ons 








gmt: 


164 
(inv). 


9. The basic position of the author in writing about aggression 
is that it: (Bloom's Category 4.20) 


*A. Is inevitable but can be direct through socialization. 


Be Can be eliminated through the process of socialization. 
(am aitiaM 


C. Will result in internal conflict independent of the 
environment. (Sub) 


D. Is crippling to the individual by wasting considerable 
energy. (CM) 


This item requires the recognition of the assumption of the 
authors which was indicated in relation to item 8. For this reason the 
item was classified as analysis (4.20). 

Foil 9B (9D, ) is an Inversion (Inv) for the same reason as 
8D (8D.,). 

5 

Aggression, 4s an innate behavior system, will produce internal 
Conplict only ai its modes, of expression are frustrated. Hence, 
internal conflict is "dependent upon environmental conditions" rather 
than being “independent of the environment." Since there is already an 
Inversion (Inv) foil in this item, another category had to be found for 
this foil. Comparing the two statements "dependent upon..." and 
"independent of...'"' the latter phrase can be treated as a replacement 
for the former. Therefore, this foil 9C or 9D,) was classified as a 
Substitution (Sub). 

Aggression can be harmful, but as one basis for intrinsic 


motivation it can be constructive as well. Foil 9D (9D,) by treating 


3) 
aggression as being exclusively harmful oversimplifies the situation. 


This oversimplification is so commonly held that this foil was 


classified as a Common Misconception (CM). 
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10. With which of the following statements concerning 
aggression would the author be most likely to agree? 
(Bloom's Category 6.10) 


A. Aggression is like dependency in that it is harmful to 
personality development. (CM) 


Bo Aggression generally interferes with the attainment of 
educational goals. (Inv) 


*C,. Aggression is potentially useful for educational 
purpose. 


D. Aggression is considered to be a response to threats 
tO & person O© person surrogate. (WW) 


This item asks the examinees to evaluate the statements made in 
the alternatives against the information given by the authors about the 
topic, therefore this item was classified as Evaluation (Copal | 

Foil 10A (10D, ) was classified as a Common Misconception (CM) 
on the same basis as foil 9D (9D3). 

As pointed out with reference to item 9, aggression can be one 
of the bases for intrinsic motivation, Hence the possibility that 
aggression may be potentially useful for educational purposes may be 
inferred from the selection. In this case the opposite statement as 
in 10B (10D,,) must be an Inversion (Inv). 

Foil 10D (10D,) is best classified as a Word-Word Link (WW) on 
the basis of the phrase "person or person surrogate." This foil is 
wrong because it contains the phrase "a response to" which is 
extraneous to the definition of aggression. (see: p. 1s) 


ll. Overt aggression would likely be decreased by: 
(Bloom's Category 3.00) 


A. Blocking of many modes of aggression. (Inv) 
B. Lessening the threat of punishment. (0S) 


*C. Increasing the threshold of motive arousal. 
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D. Motivating people to rise above their peers. (RT) 

This item was classified as an application (3.00) item because 
it asks for a practical method of behavior change. 

Blocking of modes of aggression (11A or 11D, ) would be expected 
to intensify responses in the remaining available directions, hence 
this would not necessarily decrease overt aggression. This foil was 
classified, therefore, as an Inversion (Inv). 

If the threat of punishment is lessened, overt aggression may 
or may not temporarily increase, depending upon the amount of frustra- 
tion ie of aa previously developed and the way in which the threat is 
lessened. If the release leads to an increased frustration the increase 
in overt aggression would continue. On the other hand, if lessening 
tireas also lessened frustration and provided for alternative modes of 
expression, overt aggression could decrease. Foil 11B 1105) must be 
considered an Oversimplification (OS) in this context. 

Rising above one's peers Pe involve the use of aggression as 
intrinsic motivation but it can also be accomplished by the use of 
overt aggression or the threat of its use(i.e. intimidation). The term 
"motivating" in this sense refers to "behavior modification" rather 
than motivation in the sense Uséd by psychologists. Foil 11D (11D,) 
was classified as a Redefinition of Terms (RT). 


12. Aggressive behavior in female children is: 
(Bloom's Category 2.30) 


*A. More likely to produce guilt feelings than in males. 


B. A less likely occurrence than in males of the same 
age. (CM) 


C. More unpredictable and is expressed differently than 
in males of the same age group. (0G) 
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D. Less differentiated in expression than in males of the 
same age. (inv) 


Once again the examinee is expected to go beyond the given 
information in order to recognize the role of guilt in child rearing 
practices. For this reason this item was classified as comprehension 
eersliae 

Since aggression in males and females tends to take different 
forms because of sex differences in child rearing practices, guilt is 
more likely with females. Both sexes show aggression. The difference 
in mode of expression leads to the common misconception that girls are 
less aggressive than ieee, hence the classification of 12B (12D, ) as 
a Common Misconception (CM). 

The- first two words ain 12C (12D, ) make this statement false. 
The lack of predictability is from childhood to adulthood and not 
across peer groups. This foil is therefore classified as an 
Overgeneralization (0G). 

Foul 12D (12D,) is exactly opposite to the true state of 
affairs making this foil an Inversion (Inv). 

13. If we assume that the school increased its use of contempt 

for stupidity as a motivating device, we would expect that: 
(Bloom's Category 5.30) 


A. Parental pressure would intervene to prevent the 
school from making this change. (IA) 


*B. Overt aggressive behavior would increase and 
autonomous thinking would decrease. 


C. Both academic success and generalized intellectual 
awareness would increase. (CM) 


D. The level of student motivation would decrease rather 
than increase. (RT) 


This item is classified as a Synthesis item (5.30) because it 
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involves more than one reading selection in order to achieve the answer. 


Foil 13A (13D,) involves the assumption that parents would 


) 
L 
oppose this move. However, contempt for stupidity could not be used at 
present if it did not have at least support by implication from parents 
at the present time. The major supporters of the school system, the 
middle class, want to keep the "riff-raff" out of the professions as 
unwanted competition for the aspirations they have for their own 
children. . Contempt for stupidity as Dexter (1964) implies 1s an 
extremely effective method of destroying the academically unfit. It is 
unlikely that powerful parents would oppose this move. This foil was, 
therefore, classified as an Invalid Assumption (IA). 

Contempt for Stupidity has the effect of maintaining the 
dichotomy between the academically successful and the others. Increas- 
ing this pressure would sharpen the dichotomy and would not necessarily 
increase the academic success of the survivors and would most certainly 
not increase the academic success of those who did not survive. On the 
other hand, the overt effect of this increase would be to produce an 
apparent increase in academic standards which would be expected to lead 
to the common misconception that making school achievement more diffi- 
cult orcs the quality of schooling. For these reasons foil 13¢C 
(13D, ) was classified as a Common Misconception (CM). 

Foil 13D (13D,) redefines motivation in the narrow sense of 
positive intrinsic motivation (i.e. interest). The use of contempt for 
stupidity ts, in fact, increasing the level of Gxtmimsic aversive 
motivation. This foil was classified, therefore, as a Redefinition of 


Terms (RT) foil. 
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14. Which of the following best describes the probable 
relationship between contempt for stupidity and generalized 
intellectual awareness? (Bloom's Category 5.30) 


A. Changes in either will have no effect on the other. 
(Tor) 


*Bo, AS one increases the other will decrease. 


Co Hither will inerease with an increase in the other. 
(Sub ) 


D. Contempt for stupidity should be reduced and awareness 
should be increased. (0G) 


This item, once again, involves two selections Goins and 
Awareness). For this reason it was classified as a synthesis (5230) 
item. A person who is motivated by contempt for stupidity (his own and 
others!) would be expected to be constantly en guarde against making 
mistakes. Such an orientation toward his own behavior would tend to 
make him intellectually cautious and hence less inclined to the 
expansive thinking needed to develop a generalized intellectual 
awareness. These two variables would be most likely to be inversely 
related thus explaining the correct answer. 

Treating these two variables as unrelated as in Foil 14A (14D, ) 
is contradicted by the information in these two passages. This 
statement could be true in another (independent) context, hence the 
Irrelevancy (Irr) classification of this foil. 

One part of the statement in foil 14¢ (iD) is correct, the 
other incorrect, hence this foil was classified as a Substitution (Sub) 
nie 6 58 Ue 

Foil 14D (14D, ) contains an unwarranted value judgment when only 
the relationship and not its psychological importance is asked for. 


This foil was classified as an Overgeneralization (0G). 
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The Classigication off the foils in this item 1s difficult 

- because three of the foils are based to a large degree upon logical 
relations rather than errors in logic. The three possible relationships 
direct (14D) inverse (14B) and unrelated 14A) form the basis for three 
of the four foils. It might have been more reasonable to have | 
classified 14A (14D, ) and 14¢ (14D,) as Other (0) than to attempt to 
establish classifications on the basis of the rather tenuous arguments 


given here. 


Fourth Reading Selection 


pource: Dinkmeyer, D. Co; Child Development, Englewood Cliffs, 
Ne Je, Prentice-Hali, 1965, p..59: 





The social studies committees were working on their reports. 
Doris was chairman of the southern states committee which 
mieluded. Jack, susan, and Bill. here seemed to be confusion in 

_this group so I decided to investigate. "Jack won't co- 
operate," complained Susan. "What do you want him to dot" JT 
asked. Jack was frowning. "They say I have to study economic 
conditions in the states, and I am interested in state 
eapitals,') said Jack. "Did you volunteer to take economic 
conditions?" JI asked. "There wasn't a chance to volunteer. 
We were just told her plans," answered Jack. "Is anyone 
investigating the state capitals?" TI asked. The children 
indicated this job had not been assigned. "In that case, does 
the group mind having Jack study the capitals?" No one seemed 
to care. “What about the rest of you--are you all satisfied 
with your jobs?"' They were. Jack went to the reference shelf 
and. started to read. "BR. oH" 





15. From this report we may infer that the: 
(Bloom's Category 2.20) 


A. Classroom is very well equipped with instructional 
tlaterialis.. ‘(0G) 


*B, Classroom probably has moveable seats. 
C. Class is studying the Southern United States. (Ww) 
D. Teacher favours voluntary participation. (RT) 


This item involves a "reordering, rearrangement, or new view of 
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| hive 
the material (Bloom: p. 205) which explains its classification (2.20) 
level. | 

HOt) LSA (15D, ) overstates the situation in the phrase "very 
well equipped" proposing an inference which goes far beyond the 
information in the passage than is reasonable. This foil was 
classified, therefore, as an Overgeneralization (0G). 

Leone 15.0 20) only one committee and not the whole class 
seems to have been studying the southern United States. Since this 
item already has an OG foil, the next most reasonable is a Word-Word 
Link (WW) on the grounds that careless reading might lead to this word 
association. 

Liver Oma Th (15D) the term voluntary is used in more than one 
sense. Actually, the teacher has substituted her own arbitrary 
decision tor Doris' decision. This toll could also have been a Ww foil 
except that 15C (1505 fits this category better. For these reasons 
for le) 5D (15D3) was classified as Redefinition of Terms (RT). 

16. If the teacher had written "Doesn't work well with others," 

as an anecdotal record for the above incident, this would 
have been: (Bloom's Category 6.10) 
A. Better; it says the same thing with less words. (Irr) 


*B. Worse; it fails to indicate the circumstances of the 
incident. 


C. Better; the details of the event are unnecessary when 
judging Jack's behavior. (0G) 


D. Worse; teachers are failing in their obligations in 
not supplying complete information. (CM) 


This item clearly asks for a value judgment based upon the 
evidence in this reading selection which makes it an Evaluation (6.10). 


item. 
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Since it should be evident by comparison of the two alternative 
descriptions given for this same incident that the function of an 
anecdotal record is to give a clear picture of an event for future 
reference, a goal of “saying the same things in less words" is 
irrelevant to the task at hand. For this reason foil 16A (16D, ) was 
classified as an Irrelevancy (Irr). 

Adding unwarranted statements concerning judgment of behavior 
makes this foil, 16C (16D,), an Overgeneralization (0G). 

Once again, inappropriate value judgments are involved in 16D 
(16D5), this time directed at the teacher rather than the student. 
This attitude is so common that this foil was treated as a Common 
Misconception (CM). 


17. From the above passage we can infer that Doris! leadership 
of the group was: (Bloom's Category 4.20) 


*A, Coercive. 
Be pApLocrauice (0G) 
6. Destructive. (inv) 
D. lLaissez-faire. (Sub) 
In item 17 the examiner is expecting the examinee to "comprehend 
the interrelationships among ideas in a passage (Bloom: p. 206)." 


Hence the analysis (4.20) classification. 


On the basis of the argument that the successful autocrat would 
not tolerate contradiction and therefore have no overt objection to his 
or her decisions, Doris' leadership was considered as "coercive" rather 
than “autocratic"' for the best answer. Notice, by the way, that the 
teacher is a successful autocrat in this passage. For these reasons 


foi 1/8 (17D, ) was regarded as an Overgeneralization (0G). 
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It 1s not evident from the passage that Doris! leadership was 
destructive. In fact, she apparently had the support of the two members 
of the committee one of whom reported the problem to the teacher. Being 
opposite to a possible best answer, foil 17C (17D,) was classified as an 
Inversion (Inv). 

Since Doris! attempts to coerce Jack were ineffective, she 
permitted the teacher to intervene. As a result, her later leadership 
was laissez-faire, but only under the arbitrary intervention of the 
teacher. The replacement of her later performance for her former 
periormanceimled Ho thevclacssificationwol tiie tioid (17D or 17D3) as a 
Substitution (Sub). | 

Once again, the classification of these foils is tenuous and 
open to disagreement. The format of these foils also deviates from the 
usual format of foils in this test, ais in the case of item 13 and 
items 19 to 24 inclusive. It is possible that an "Other" (0) classifi- 
cation of these foils would have been more reasonable. 

18. From the description of the incident, we can conclude that 

thesteachner!s handling of the ancident was: 
(Bloom's Category 6.10) 


A. Good; she intervened to prevent a serious conflict from 
continuing. (Sub) 


*B Poor; she allowed Jack to use her authority as a lever 
to get his own way. 


C. Good; she resolved the problem to the mutual 
satisfaction oletne group... —Gir) 


D. Poor; she failed to collect sufficient information 
before enforcing a decision. (0G) 


This item created a good deal of consternation upon its first 
administration. At issue seemed to be philosophical differences between 


the examiner and the examinees. It is probable that this problem may ‘be 
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a complication which is eee in any multiple choice 
Evaluation (6.10) item where the evaluative criteria is not supplied. 
The problem arose essentially because many examinees insisted that the 
eon pf the,teacher| wasetoj) prevent or toveliminates confiicten In’ - 
this case either 18A or 18C would ve correct depending upon the 
interpretation given to the phrase in the reading selection "no one 
seemed to care." 

The examiner, on the other hand, took the stand that the 
function of the teacher is to educate. If conflict arises the conflict 
should be used in an educational manner. In this case, the teacher 
should have found out why the topic of economics was important emcee 
to the group to have engendered the conflict. Once Jack understands its 
importance he may agree to do it. That is, the teacher helps to improve 
communication. Ii, on the other hard, Jack remains adamant, forcing him 
to do something disagreeable to him may not help. In this case, the 
reorganization of committees more nearly upon sociometric lines might 
improve the situation. There may be some personality reasons for Jack's 
behavior. In this case, the teacher's long-term role is to help Jack 
cope with his own and others! personalities. Letting Jack have his own 
way does not help meet this latter goal. .Hence the keyed answer. 

Some of the examinees argued that they had insufficient informa~ 
tion to answer this question. This argument was discounted because the 
differences still seemed to be philosophical. In any case, the few 
people who chose the keyed answer had the highest average total-correct 
score which meant the retention of this item. 

The prevention of conflict rather than the educational use of 


conflict led foil 18A (18D,) to be classified as a Substitution (Sub). 
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With education as a goal, the mutual satisfaction of the group (as in 
18C or 18D, ) is irrelevant, hence the foil was considered an Irrelevance 
Gla). 


The classification of 18D (18D,) as an Overgeneralization (0G) 


3) 


is somewhat arbitrary. The teacher could have sought more information, 
but the essential problem is her use of the information which she 
obtained. Since 18C (18D, ) was already classified as an Irrelevancy 


(Irr) some other classification is needed. 


Fifth Reading Selection 


pource-ss) Prescot, Dick. 3 he Child mim the Educative Process: 


N.Y ry MeCraw-Hi Ll) 957 Sepp. 125-126. 


Progress Report 


X Attendance Area Y County schools 


Name: Chester M Teacher:. Mase C. Grades. 6 
Days Absent: 0 Days Tardy: 0 


Reading: Is reading independently on the third-grade level and 
instructiogally on the fourth-grade level. Does not enjoy 
reading. Finds many excuses to leave reading to do something 
else. Has trouble understanding what he reads. Is better able 
to find facts than to interpret facts. Has trouble finding 
words in context when meaning is given. 


English Language: Has a wide speaking vocabulary. Uses correct 
English. Does not enjoy story writing. Understands sentence 
construction. 


Spelling: Learns words in spelling lessons and uses them in 
written work. Enjoys spelling. 


Writing: Spaces words well. Is practicing again on the 
Omani Lo Ow Ierwews, IS) ew Mes Win Wwoewiwem wore, lime sec 
often. 


Arithmetic: Has worked again this year on addition, subtraction 
and multiplication. Had some trouble with subtraction. Is not 
ready for division. Has had experience with problem solving. 
Enjoys arithmetic. 


Social wivdies: (History, geography, and civics). Has worked 
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with maps. Takes part in discussion. Showed interest in a 
study of his community. Shared materials. Is trying for a 
better relationship with classmates. 


science: Experimented with the force of air. Has become 
interested inscloud*iormation. Likes dogs. 


Music and Art: Listens to music. Takes part in singing and 
rhythms. Enjoys all phases of music. Works with clay, wood, 
paints, and fingerpaints. Enjoys all media of art expression. 


Instruction for Questions 19 to 24 


Based on the above progress report answer the next six questions 
by marking: | 


A. If the hypothesis is supported by the facts. 
B. If the hypothesis is implied by the facts. 
C. If the hypothesis is refuted by the facts. 


D. If the hypothesis cannot be tested by the facts. 
(Bloom's Category 4.20) 


Hypotheses to test: 


19. Chester is not liked by the other children; he avoids 
trying to read because he doesn't want them to see him 
sehr alee 





20. Chester lacks character. He does all sorts of bad things 
and will not discipline himself to learn to read because he 
has not been punished enough. 


21. Chester is growing very slowly and really is quite immature 
for his grade. Everyone expects too much of him. 


22. Chester has no réal reason to want to read, since no one 
ever reads at home. 


23. Chester's reading deficiency has not yet begun to affect 
seriously his performance in other areas. 


24, Chester's mother has kept after him about reading until he 
hates it. 


All six of these items were classified as Analysis items (4.20) 
because of the hypothesis testing characteristics of their format. 


This format was used as a marker for analysis subtests. However, 
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because of the format, the classification of the foils became 
problematic. The simple expedient was used of classifying all the foils 
for these items as Other (0). 


25. The most useful suggestion to help Chester is: 
(Bloom's Category 4.20) 


A. To give Chester personal warmth, acceptance and support 
wherever it is appropriate. (Sub) 


*B. To give Chester concrete help in getting started on 
specific tasks, especially in reading. 


C. To give Chester responsibilities and roles of 
acknowledged importance in the daily life of the 
classroom. (Irr) 

D. To try to get Chester's mother to take the pressure 
off him and offer him more opportunities for self- 
Orreceror: = ay) 

The examinee is expected to comprehend interrelationships in the 
answering of this item, hence its analysis (4.20) classification. Foil 
25h (25D, ) substitutes emotional support for corrective instruction, 
hence the Substitution (Sub) classification of this foil. 

The treatment suggested in 25C (25D, ) has no bearing on his 
academic needs; it was therefore classified as an Irrelevancy (Irr). 

There is no evidence in this selection that there is an 
unreasonable pressure on Chester by his mother; hence this foil 25D 
(25D,) involves an Invalid Assumption (TA). 

P66, ~1f additional” informawron” on, Chesver as" desired,” and none 
of the following had been attempted, which one would 
provide the greatest amount of immediately useful 
information? (Bloom's Category 5.20) 

A, An interview with Chester's previous teacher. (0) 


*B. An interview with the parents. 


C. A diagnostic test in reading skills. (0S) 
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D. <A request for the assistance of a guidance counselor. 
CIry) 


This item involves the examinee generating a structure to 
represent Chester's entire situation by inductive reasoning prior to 
answering the question. For this reason, this item was regarded as a 
synthetic (5.20) item. 

The best first hand source of information about Chester is his 
parents. The next best.is his previous teacher. Since the Guidelines 
do not make any provision for this kind of relationship, foil. 26A (26D, ) 
is best classified as "Other" (0). 

A diagnostic test in reading skills is only useful to a teacher 
who know enough about these tests and the reading problems they 
diagnose to be able to use them effectively. Also, administering and 
interpreting such tests can be time consuming. It is an Oversimplifica-— 
Pena uos for foil 266 (26D,) to suggest this course of action to be 
Superior to any other. 

Most of the examinees were experienced teachers, hence it was 
reasonable to assume that they would know from experience that guidance 
personnel rarely can give a teacher information they do not already 
know. This effect occurs because test batteries are rarely more 
reliable than a month or two of sensitive observation by a teacher. 
Therefore, the course of action suggested in foil 26D (26D.) is 
classified as Irrelevancy (Irr). 


27. A reasonable conclusion which can be drawn from this 
report is that; (Bloom's Category 4.10) 


A. Chester's problem stems essentially from his poor 
relationship with his mother. (IA) 


B. Chester's problem stems essentially from his poor 
relationships with his peers. (Irr) 
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*C. Chester's problem has no single cause and no simple 
solution. 


D. Chester's problem stems from such a wide range of 
sources that a classroom solution is impossible. (CM) 


This item is somewhat difficult to classify because the "drawing 


conclusions" is not part of Bloom's Taxonomy. However, this item also 





involves the Ben ed ules By in distinguishing facts from hypotheses" 
(Bloom; p. 205), hence the analysis (4.10) classification of this item. 

Once again, the mythical poor relationship with his mother is 
iyuGroducedie Tit Caleta / a (27D, )- In this context the most reasonable 
classification of this foil would be an Invalid Assumption. (IA). 

Chester's problem seems to be centered upon his reading 
difficulty. His relationship with his peers may influence his motiva- 
tion to attempt improvement, but is irrelevant to his problem. There- 
fore foil) 128 (12D, ) was classified as an Irrelevance (Irr). 

Wom 27D (27D. ) overgeneralizes to an extreme level which made 
the most reasonable classification for this foil to be Common 
Misconception (CM). 

20. in Chester's progress’ report, which one of the following is 

the mest) amporivant factor sconiributing to his ditriculty 
with school achievement? (Bloom's Category 5.30) 


‘*A, Aggression which is building up due to frustration 
over his reading development. ' 


B. His inability to develop a generalized intellectual 
awareness. (Irr) 


C. His weakness in reading which is affecting all areas 
of Wearing” "i 0G) 


D. The teacher has been using "contempt for stupidity" as 
a motivating device. (TIA) 


This item, once again, involves more than one reading selection 


and was therefore trested as a synthesis (5.30) item. 
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Foil 28B turned out to be somewhat unreasonable because the 
author assumed that the ae would be able to identify ine fact that 
generalized intellectual awareness is an adult phenomena. The evidence 
is tenuous since the Awareness selection from its title is related to 
university education, and Chester in this (Progress) selection is in 
Grade six. This foil would be an Irrelevancy (Irr) but it may have been 
unreasonable to expect so tenuous a connection to be made if it could 
not be assumed that the examinees would know this fact. 

The progress report indicates that there are some areas of 
Chester's development which are not out of step which makes foil 28C 
(28D,,) an Overgeneralization (0G). 

Foil 28D (28D) suggests that this teacher is using contempt for 
stupidity for motivation, which may be an Invalid Assumption (id) since 
the tone of the report is supportive rather than condemnatory. 

29. On thesbasrselof yi hei foregoingewhicheot ihe following seems 

.to be the most important consideration when preparing 
anecdotal records or progress reports? 
(Bloom's Category 5.30) 


A. Make no attempts at interpretation since your judgments 
are probably biased. (CM) 


*B. Present as much information as possible about all the 
salient aspects of the situation. 


C. Be as brief as possible, giving no information which 
may cloud the central problem. (0S) 


D. Do not put anything into these reports which might 
antagonize the child's parents. (RT) 


This item also involves more than one reading selection and 
therefore it was classified as a synthesis (5.30) item. 
Ing thescase ofgioale 294 (29D, ); it is impossible to avoid 


judgments in any reporting, hence this advice is a Common Misconception 
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(CM). Because of the problem of observer bias, as much pertinent 
information as is possible should be supplied so that alternative 
interpretations can be considered by others. This latter statement 
makes foil 29C (29D,) an Oversimplification (0S). 

pet Onl ee) (29D, ) the term report referring to "progress 
report" is redefined by the suggestion that such a report may become 
DUO Pte property, 1.c. i tuwill be part.or the “report card’! to the 
parents. This approach involves a Redefinition of Terms (RT). 

30. The most important principle illustrated in this set of 

questions is that the teacher should: 

(Bloom's Category 5.30) 

A. Promote generalized intellectual awareness in 
aggressive children by using contempt for stupidity 
as a motivating force. (Ww) 

*B. Recognize that developmental deficiencies arise from 
complex circumstances, requiring multiple-strateg 


solutions. 


C. Seek professional assistance from the school counselor 
in the identification of developmental problems. (Tr) 


Do - Recognize that “contempt for stupidity" is not 
necessarily an effective way of generating motivation 
ime pupils. sch) 

This item synthesizes the previous twenty-nine which led to its 
synthesis (5.30) classification. 

Foil 30A (30D, ) is a@ good example of a contrary-to-fact 
statement developed by the glib use of the repetion of phrases from the 
reading selections. It illustrates very well the way in which Word Word 
Link (WW) foils might be generated. 

The discussion concerning the role of the counselor which 


occurred on page 66 suggests that it would be more reasonable to get the 


teacher to identify the problem and then get the professional's help in 
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its treatment. However, this approach means changing the role of the 
counselor from diagnostician to prognostician. The relationship 
recommended by foil 30C (30D,,) may, therefore, be reversed to the ideal, 
and this foil was classified as a Transposition (Tr) for these reasons. 

One of the important characteristics of the so-called "Puritan 
Ethic" is its heavy reliance on adversive extrinsic motivation (i.e. 
punishment ) as a means of regulating behavior. Contempt for stupidity 
Fy. an extremely effective method of motivation in this context if the 
ill effects of aversive motivation are ignored. Furthermore, there is 
a tendency, for political reasons, as already suggested that the 
strongest supporters of the school also support this method of motiva- 
UhOn.eeOn thie basis, toil 30D (30D5) issclearly incorréct, For it to 
be considered correct, the term "motivation" must be confined to 
intrinsic positive motivation (or interest) making this foil a 
Redefinition of Terms (RT). 

In general terms, the overall classification of the foils as 
reported here is probably open to considerable disagreement as suggested 
by the low interrater reliability. Would other researchers have 
produced superior results? One of the raters of the items made the 
following e poeliete ClagsiLicatvLons: 

1. 6D, was classified as a Word Word Link (WW). 


3 


21s LUD was Cclassiiied as miuhesti cit answer, 


) 
She 15D, was classified as an Invlaid Assumption (IA). 
In this case his classification of 6D. turned out to be 
supported by the cluster analysis (see: po (js nis clagsiticarivon of 
10D. was incorrect for the reason given (see: p. 165)3 and his classifi- 


Z 
catLon OL 15D, was not supported by the cluster analysis. (see: De eae 
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This success ratio of one out of three is equivalent to that of the 
experimenter (see: p. 75). 

It should also be noted that the nature of the examination 08h, 
at least in part, predetermined by the examiner's philosophy of 
education. This characteristic of the examination is evident in the 
first place in the nature of the reading selections upon which the 
examination is based. Second, it is evident in the nature of the 
questions asked concerning these reading selections. Third, it is 
evident in the reasons given for the classification of items and foils. 
In AOD it is hoped that the ma jor portion of This bias 26 -contaned 
to the nature of the reading selections used and that once these are 
given the astute reader should be able to infer the bias, and answer 
accordingly. The possible exceptions which are clearly evident are 
item 18 and item 28, particularly with respect to foil 28B (28D, ). 

It is being argued that bias is unavoidable and, hopefully, can 
only be minimized in its adverse ave: upon student performance. The 
clear thinking student should be able to recognize and adopt a number 

of points of view concerning any particular subject matter and apply 
logic, once the point of view is assumed, in order to arrive at 
reasonable conclusions. So long as the logic which follows from the 
assumptions cannot be faulted the system itself can remain intact. 
The purpose of presenting the development of the experimental test in 
such detail was to expose the logic of the test including the reasons 
why the foils are considered to be wrong (i.e. its construct validity) 
to such reasonable attacks as may be made. If the construct validity 

of the test is supported, on both logical and evidential grounds, then 


the experimental test may be regarded as an effective measuring 
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184 
instrument of higher mental Pee eaincd independent of the assumptions 
upon which the particular items are based. In this case the refutation 
of these assumptions by subsequent evidence will not diminish the value 
of this procedure as a method for the construction of measuring 
instruments for the evaluation of student performance in the cognitive 


domain (i.e. in the use of higher mental processes). 
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APPENDIX C 


LOGICO-SEMANTIC ANALYSIS OF RIGHT 


AND WRONG ANSWER CLUSTERS 
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APPENDIX C 


LOGICO-SEMANTIC ANALYSIS OF RIGHT 
AND WRONG ANSWER CLUSTERS 

This appendix presents a detailed discussion of the items and 
foils which differed in their advance classification from the classifica- 
. tion of the clusters in which they occurred. A cluster was classified by 
the most frequently recurring advance classification in the cluster. 

The findings for this part of the study were summarized on pages 
(OuetOr7)) Siow thesrighteansweroclusters\and®on pages’°71" to 79 for the 
wrong answer clusters. This detailed analysis is given here for two 
reasons.) First, itswas ieltethat*the effective reclassification of 
alternatives represented evidence in support for the multiple interpreta- 
tion hypothesis. Second, it was felt that subsequent researchers might 
find value in an independent evaluation of the logic which led to the 


conclusions this study has presented. 


The Meaningful Interpretation of Item Clusters 
In an exploratory study into a new area of research, the 
relative relevance of characteristics can be expected, in general, to be 
unknown. For this reason, the failure of the advance classification of 
items to provide much assistance towards a meaningful interpretation of 

the data was disappointing, but not surprising. 

On the other hand, the cluster solution used, replicated the 
advance classification by Bloom's Taxonomy to the extent that 40 per 
cent of the items which appeared in a single cluster also held a common 


advance classification. Table IT (see: p. 73) gives the clusters from . 


Group A and indicates which items were in a common classification, the 
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classification of these items, and the final interpretation given to 
each cluster. 

Where the interpretation remained ambiguous in Table 11 
(see: De 73) the uncertainty is indicated. It would be a fairly simple 
matter with clusters like Ce and Clo? for instance, to assume that the 
‘advance classification of these items adequately interprets the cluster. 


In other cases such as Cys Cp iions Cos the majority of the items 


5 
were in a common class. If the class of the majority is used as an 
interpretation, the members which did not share this common 
classification must be explained. 


Finally, there were several clusters, C Cn57 and 0 


BE irae 8 9 
which did not contain even two members which heaved a common advance 
classification. The interpretation of these clusters would seem most 
problematic, but must be attempted. 

several considerations were used in an attempt to arrive at an 
unambiguous meaningful interpretation of each cluster. The first and 
most obvious one was the advance classification of the items by Bloom's 
Taxonomy. 

Second, the possibility that some clusters (Cy for instance) 
might be content clusters could not be entirely discounted. 

Third, the possibility that items might have been misclassified 
in either of two possible ways. The aspect of the item which leads to 
the misclassification might be related to some obvious but irrelevant 
format characteristic. This problem has already been illustrated by 
the case of items 19 to 24 inclusive. (see: Appendix B pp. 175-177) 6 
Alternatively, there may be a discrepancy between the way in which the 


examiner intended that the item be interpreted and the way it was, in 
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fact, interpreted by the examinees. For instance, an item which is a 
comprehension item for some students may well be an analysis item for 
others. Thisslatien possibility would suggest that the classification 
of items might be better after their characteristic clustering has been 
determined than before the test is given. 

HYourth, since sthis study is postulating that the foils may have 
some effect upon the interpretation of the item and, therefore, the way 
in which it is answered, the nature and selection ratio of ea foils 
should also be taken into account when an attempt is being made to 
interpret the clusters. . In this respect, foils with a selection ratio 
of .05 or less were dropped from this and subsequent analyses, since 
these foils were selected by too few people for the statistics pertinent 
to these foils to be stable. 

Finally, there were other sources of information about these 
items, such a5 the inveritem phi, correlation coefficient matrix and the 
item consistency which might have proved useful in the attempt to make 
an unambiguous interpretation of each of the item clusters. 

In the discussion which follows each of the ten item clusters 
are dealt with in turn in an attempt to establish an unambiguous 
interpretation for each cluster. In advance of each discussion a table 
appears which supplies the following information: 

1. The numbers of items in the cluster. 

2. The subject matter content from which each item is drawn. 

3. The advance classification of each item. 

4, The biserial correlation (a3) of each item with the total 
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6. The selection ratio for each foil and the advance 
classification of each foil for the items in the cluster. 
The foils which are dropped are also indicated. 

Any other information needed for the discussion is supplied in 
the context. Table 42 follows on page 190. 

Three of the four items in Table 12 have their content clearly 
drawn from the Stupidity reading selection on pages 41 and 42, and the 
fourth one, item 8, has a reference to this selection in its stem. 
However, item 8 can be answered without having read this selection since 
a good student should be able to infer what is meant by the phrase 


Yeontempy fom stupadity”, frome=the context.| Also, foil 28D, was the only 


3 
part of item 28 which contained a reference to this selection but, once 
again, it should be possible to infer the meaning from the context. 


Furthermore, foil 28D, is classified as an IA (Invalid Assumption) foil, 


3 
the invalid assumption for which can be arrived at without reference to 
the "Stupidity" selection. In addition, this cluster does not exhaust 
the items in which a reference to this selection is made. There are 
five other such items. For these reasons, it is not possible to 
interpret this cluster unambiguously on the basis of content. 

The relative magnitude (.378 to .456) of the i (biserial 
correlations with total scores) varies sufficiently to suggest that they 
were probably not related to the statistical artifacts which caused 
these items to form a cluster. Also, since none of the difficulties or 
selection ratios (D,) are large enough that these items must (of 
necessity) overlap, the D, ratios cannot be considered to be relevant in 
this event either. 


Since three of the items were classified in advance as analysis 
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ea 
items, whereas the fourth one (item 28) is Synthesis as this class was 
defined, the reasonableness of retaining the Analysis classification for 
this cluster is greater than for changing it to a content-oriented 
classification. Also, as has already been noted, item 28 has identical 
foils by class with item one. Furthermore, in three of these items 
(1, ‘oh 28) the most commonly selected foil has an advance foil classifi- 
cation of 0G (Overgeneralization). In this case, if cD, could be 
reasonable reclassified from Substitution to OG an alternative to the 
reclassification procedure is to suggest that the categories of foil 
given above may not be Paedensont: In this case, the common element to 
these items would be the common classification of the most commonly 
selected foil. This argument would be strongly supported if ali of 
these foils fell into a common wrong answer cluster, which they did not 
do (Zep, is the exception). It is reasonable to be reluctant to 
classify a cluster derived from the right answer correlation matrix of 
Group A on the basis of the performance of foils to the items when 
performance on specific foils is not part of the statistical basis from 
Woche husticlustetsic iderived. Gn this, cluster, the classification was 
(reluctantly) retained as Analysis, even though this meant reclassifying 
item 28. Table 43 follows on page 192. 

To begin with, in Table 43 there was no consistency between the 
items in Cluster C, with respect to their content (information back- 
ground) which might have accounted for the formation of this cluster. 

A similar statement can be made for the advance item classification, for 


the relative magnitude of the r, and the D, coefficients, and for the 


b 


advance classification of the foils. 


* 


Superficially, then, there would seem to be no basis for the 
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interpretation of this cluster. However, an examination of the foil 
classification forts tem GOs revealing (see: PV WSL 1 Bo) eae 30D, was 
an RT (Redefinition of Terms) foil. Two and possibly all: three of these 
foils are related to Comprehension-type operations (i.e. they are 
Misreading type foils). If the foils of an item could be eliminated by 
comprehension-type strategies, the fact that the stem-right-answer 
relationship involved a synthesis-type relationship may have been 
irrelevant. Similarly, if the stem-right answer relationship can be 
recognized by comprehension-type strategies without having to eliminate 
foils, an item may be a comprehension item with high level foils. Such 
a combination of arguments could Sites (ition item 3 and item 30 occurring 
ineunivteeroup 0 ln “order? to) account for “the presence! of atten’ 17 inthis 
cluster, it is necessary to suggest that the OG, 0S, ete., type foils 
are related to analysis-type strategies. For some individuals, then, 
item 3 could have been treated as an analysis item because of the high 
Tevelmor sthe roids... sili this ern item 17 would have to have most of 
the examinees who selected the correct answer in the top one-third of 
the group as defined by total score correct, and there would have to be 
evi pha cosiiicient between) ivems 3 “and? 1/7... Whe resulistsupport this 
contention. In the first place, about 80 per cent of the individuals 
who answered the item correctly were in the top 40 per cent of the 
group. "* Invaddition, the phi’ correlation ‘coeffacient between atem Gand 
Trem wW7. Gewese which. Gs sien i fi cant au. ay probavil ayewevel tor 
02>p>.01. 
These results did not make possible the unambiguous 


interpretation of Cluster C On the contrary, the interpretation would 


oe 


seem to be that this cluster involves multiple strategies, some at the 
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comprehension level, and some at the analysis level. Hence C, cannot 
be defined in terms of a unitary category from Bloom's Taxonomy 

: Once again, the data suggests right- wrong answer interactions. 
In addition, there appeared to be a multiple-strategy level involvement 
in the cluster. 

Cluster C5 Table 44 (see: p. 195) proved to be somewhat 
similar to C, in that the same phenomena occurred once again. All the 
initial bases for interpreting this cluster failed to provide for an 
unambiguous decision. In addition, the high level (Synthesis) item 
Ae to be lowered by virtue of the low level foils. 

It may be reasonable to relate ene 4 and 6 because of the fact 
that 4D, was reclassified a CM (Common Misconception) in the interpreta- 
tion of the wrong answer clusters and “D, became unclassifiable. It is 
possible that these are both low level foils which might lower the 
analysis classification of item 4 to comprehension. This argument 
concerning the reclassification of item 4 must remain inconclusive since 
a classification of 4D, was not established. 

The fact that these two clusters (C, and C,) proved to be 
similar raises the question as to why they did not form a single 
cluster. Obviously, the items from one cluster did not correlate 
highly with the items from the other, but this fact does not add any 
int oOrma viOMmIsince Logis thisylacta whacheis vhe statistacaly basis for 
the information of clusters. A look at the wrong answer clusters 
(indicated by W,) into which the foils fell proved illuminating. 

Table 45 (see: pe. 196) shows that there is a degree of 
of the wrong answer clusters (W, occurs in two 


2 9 


items). C, has a high level of similarity among the foils (Wl, occurs in 


3 


similarity within C 
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two items). C. has a high level of similarity among the foils 
Bp) 

(W, ocieurstinial lithree. items and Wo in two). There are no common 
wrong answer clusters between the two groups. This evidence would have 
been much more conclusive as to why these clusters were distinct if the 
ond) groups, in C, were more strongly similar. Once again, however, 
there is some indication that foils may influence the formation of 
right answer clusters. 

In any event, the fact that there seems, once again, to be 


multiple strategies involved in this cluster means that it cannot be 


unambiguously classified. 


TABLE 45 


WRONG ANSWER CLUSTER MEMBERSHIP 
WITHIN AND BETWEEN RIGHT ANSWER 
CLUSTERS C, AND C 


é 3 
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Item Wrong Answer i Item Wrong Answer 
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As Table 46 shows (see: 4, eae there is no clear basis for 
the unambiguous classification of Cluster Ch from any of the sources of 
data being used. This cluster, therefore, remained unclassified. 

On the other hand, item -5, in addition to involving a practical 
application, also involves "going beyond the given data to determine 
implications...which are in accordance with the conditions described 
in the 'original communication'" (Bloom's Taxonomy, 1956, Pe 206) ees 
could mean that the best Bloom's Taxonomy classification for this item, 
ee the application aspect is ignored, is comprehension (2.30: 
Extrapolation). This item may, therefore, be capable of a dual 
classification. Item 19 is the only one in this series in Pein the 
correct answer involves the implication of the statement by the reading 
selection. Item 14 also involves extrapolation except that the extrap- 
olation is from two selections rather than one, making this item 
(arbitrarily) a Synthesis item. It is too early in the development of 
this testing technique to be dogmatic on a post hoc basis about the 
interpolation on a strategy basis, of any cluster. This statement is 
particularly reasonable since this clustering does not cross-validate 
(see: Do 94). However, it does suggest that a better definition of 
the oe strategies which may be involved in the answering of 
multiple choice items for the types represented on this test might 
improve the effectiveness of the advance classification of the item, 
and most certainly would improve the interpretation of the clusters 
which were found to be peculiar to a particular group of examinees. 

In cluster C,, Table 47 (see: pe 199), the common element, on 
the basis of the decision rule that the cluster be identified by the 


most frequent process category from the advance class, would be 
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eAnalysis. Item 7 was originally classified as an Application item 
because the stem asked for "the best example." In other items the 
lowering of the level of performance of that item by low level foils 
was observed. In this case the Aa ee the analysis-related OS 
(7D,) foil notes had the opposite effect. Furthermore, foil 7D, 
(an OS foil) was classified during the interpretation of the wrong 
answer clusters as NS (Non Sequitur) (see: p. 225 for definition) which 
was one of the three new foil categories which came out of these 
discussions. Since both of these categories (OS and NS) seemed to be 
more related to the logic of the item than to its semantics, these types 
of foil may help to define analysis type items. This anion eae ce was 
strengthened by the fact that about 73 per cent of the examinees who 
chose 7D, (the WW foil) were in the bottom 60 per cent of the group, 


whereas about 65 per cent of the examinees who chose 7D, (the OS foil) 


3 
were in the top 60 per cent of the group. These figures suggested a 
moderate but definite trend on the part of these foils to move the 
performance of this item upward in level. The same trend is evident in 
the average total-correct values for each of these foils and for the 
right answer. Those who chose 7D. had an average total correct score 


of 11.6, while those who chose 7D, had an average total correct score 


3 
of 12.2, and those who chose the correct answer had an average total 
correct score of 13.7. The average score on the entire test was 12.2. 
The basis for the upgrading of item 7 to the Analysis level is 
somewhat tenuous. The fact that it would otherwise be the only anomaly 
in this cluster strengthens the use of the decision rule. The use of 


the rule is further strengthened by the fact that none of the possible 


bases being used for interpretation give a more reasonable explanation 
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Cluster Ces as 
classified as Analysis 
advance. However, the 
bases for interpreting 


selection ratios) were 


201 


Analysis classification of this cluster was 


summarized on Table 48 on page 262, was 

since both its members were so classified in 
procedures being used suggest at least two other 
thigsiclusters || First, the difficulties em 


high. Second, the most commonly selected foils 


fell into the same wrong answer cluster CW, ) These two events could 


be related to each other since the advance classification of the foils 


with their respective items are different. 


In any event, Ce was treated as an Analysis cluster in 


subsequent statistical analysis. 


In Table 49 on page 203 none of the bases being used for 


interpretation assist in the explanation of the formation of cluster C 


i 


except the arbitary rule that the majority of the items shared a 


common classification in advance. In both these items (Item 10 and 


Item 16) a value judgment is specifically asked for in the stem and 


explireitiys included in 


SElChmO Ml Memeleueancuanel sme wommeranGem 


(Item 18) which had these same characteristics fell into another 


. eluster. Item 29 asks 


fom the most Bmportant consideration" in ithe 


stem which makes the stem contain an explicit request for a value 


judgment; however, this value judgment is not explicit in the alter- 


natives. The term "likely" occurs in the stem of Item 11 suggesting, 


perhaps, an implicit value judgment may be involved in this stem. 


Should the definition of the Evaluation level of items as used in this 


study be extended to include implicit as well as explicit value 


judgments? The results of the cross-validation part of the study 
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204 
suggested that Group B responded quite differently to these items. 


In Appendix B where Item 26 from Cluster C, in Table 50 


8 

(see: De 205) was discussed (see: PPo VOT 8 ) it was pointed out that 
an inductive structure had to be generated in the mind of the examinee 
an order 40 ayiswer this questiow; led! to its classification as a 
Synthesis item. All other Synthesis items had the additional 
characteristic of involving more than one reading selection. The use 
of the device of having more than one reading selection as a basis for 
Synthesis items did not generate a unique cluster. On the other hand, 
if the logic of Item 12 and Item 20 are examined, it becomes evident 
that a similar process of reasoning to Item 26 may have been Pe ee 
in these two items as well. 

The classification of Item 20 as Analysis was made because it 
is clearly structured so as to involve a hypothesis testing procedure. 
If GCachMof the alternatives in Item 12 and Item 26 were also regarded ee 
hypotheses which were to be tested against the inductive structure 
employed by the examinees in Group A, the answering of these items may 
have been relatively homogeneous. 

This cluster (Cy) is classified as Synthesis on the grounds used 
for the advance classification of Item 26, 90r it could be Analysis on 
the grounds used for the classification of Item 20. However, in 
multiple choice format it is impossible to avoid hypothesis testing 
aspects of a Synthesis item when a specific set of alternatives is given 
in the item. The type of item in this cluster came about as close to a 
Synthesis level as it may be possible to come in multiple choice items. 
Bloom (1956) suggests that if ambiguity of classification occurs, it 


should be resolved in favour of classifying to the highest possible 
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level. For this reason, on the basis of the performance of these 
items (at least within Group A) the items in this cluster have been 
reclassified as Synthesis items. 

Both Cluster C, items, as summarized in Table 51 (see: p. 207), 
are from the same reading selection. Also, the Procrustes rotation to 
content suggested a possible content basis for this cluster. However, 
both of them also proved to be very poor items on the basis of the 
combination of their rr and difficulty 0, selection ratios). In 
addition to this their most common foils Sochna in the same wrong 
answer cluster (We) and the De selection ratios of these two Tolls were 
very high. It is probable that these two items form a cluster, at least 
partly, on the basis of the relationship between these two foils. This 
cluster remained unclassified in subsequent statistical analysis. 

There would seem to be two possible bases for the interpretation 
of Cluster Cio as summarized in Table 52 (see: p. 208) content and 
advance classification. It is possible that both of these factors were 
operative to make this cluster distinct from other Analysis clusters. 
Some support for this argument may be found in the fact that Cho was 
positively correlated with all the other classified clusters in the 
apenas analysis (see: Table 20, p. 69) except Cos although 
all of these correlations may have been too small to have much meaning. 

In summary, then, this interpretation attempt upheld the 


Analysis level classification of Clusters C ee Co: and C_, as they 


thes 10 


emerged in the comparison between the results of the minimal interpoint 
distance cluster analysis and the advance classification of the items. 
It similarly upheld the classifiweation of Co as Evaluation. The 


procedure led to the reclassification of one cluster (C,) from 
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undetermined to Synthesis. In the remaining clusters (Cs Cay Cys and 
Cy) it was impossible to provide an unambiguous basis for classifying 
these clusters from the available data. Hence these clusters remained 
unelassified. Clusters C, and C. seemed to involve some form of multiple 
strategies and Cy seemed to be a Comprehension level cluster for Group A 
if the superficial characteristics of the items which led to their advance 
classification were ignored. Since this cluster did not reappear in Group 
B, multiple strategies between groups of examinees may be involved, © 

Oniveinm thesease of Cy can content be said to be a more 
reasonable interpretation of these clusters than some form of single or 
multiple strategy. Kven in this case, the content interpretation is in 
some doubt, suggesting that this test is essentially "process oriented," 
as intended, 

From the original 30 items on the test 12 items (4.0 per cent) 
formed at least pairs in the clusters which emerged. This "interpret- 


ability" figure was improved to 19 items (63 per cent) on the basis of 


the interpretation procedure used. 





The Meaningful eter pret eon of Wrong Answer Clusters 
A Similar interpretative procedure was used for wrong answers 
as for the items. The considerations were taken in the following order, 
1) advance classification, 2) information background (content), 3) 
statistics of foils, 4) logico-semantic analysis. A cluster was again 
assumed to be identified by the most frequently recurring advance 
Classitrcataon.e, Folls of the, 0 (Other) category were assumed to be part 
of this common classification. 


Of these four, the only one related to the logico-semantic 


characteristics of the foils was their advance classification. If these 
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clusters could not be accounted for in other ways, and the logico- 

. Semantic characteristics of the foils can be shown as a possible basis 
of cluster membership, then this latter basis may be the best available 
interpretation. It has already been shown that the advance classifi- 
cation of both right answers and foils did not survive very well in the 
cluster analysis. It has also been shown that this classification can 
be improved by examining the clusters for some relatively unambiguous 
basis Tor interpretation. Inthe case of foils this involved 4 re- 
examination of their logico-semantic structure. The five bases used in 
an attempt to interpret the wrong answer clusters were, once again: 

i The vadvanceweclassalicazion of the oil. 

Ze ‘Che content of the foil. 

m.) The selection, ratio or each Tor. 

4, The relationship between right answer and wrong answer 

clusters. 

5. The reconsideration of the logico-semantic structure of the 

foils. 

In some cases, material from other sources such as Powell and 
Isbister (1969) were used to assist in this interpretative process. 
Finally, one cluster (w.,) was completely lost by virtue of the low 
selection ratio among all of its members. In addition, two other 
clusters were reduced to single members by this procedure. In the 
discussions which follow some tentative attempts are made to account for 
the "special cases" as well as the "general trends" in each cluster. 

For the convenience of the reader, whenever a particular foil is 
being discussed for the purpose of reclassification, the stem of the item 


and the particular foil are both given ahead of the pertinent discussions. 
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Table 53 on this page gives information for the interpretation 


of wrong answer cluster Wy 


TABLE 53 


INFORMATION FOR THE INTERPRETATION 


OF WRONG ANSWER CLUSTER W, 





Advance D a 
Foil Content Classification n 
1D, Stupidity oc? 83 
eD, Stupidity Sub Aron! 
22D. Progress © 84 
SD, Aggression OG sys) 
17D, Discipline OG «80 


a. The symbol De refers to the selection ratio. 


b. Foils which had appeared in a common category in the advance foil 
elassiiticataon. 


The content from which these foils in wrong answer cluster Ws 
are drawn comes from a broad spectrum of the test ruling out content as 
a possible basis for the interpretation of this wrong answer cluster. 

The phi coefficients upon which this cluster is based are 
dependent upon ke size of the overlap between particular pairs and upon 
the marginal ce When the selection ratios for two alternatives 


both exceed .50 there is tendency for the range of phi to be shifted 


positively. In this case, all of the selection ratios in the cluster 
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exceed .50 and for this reason the sizes of the selection ratios could 
Deva contributing factor to the statistical formation of thie cluster. 
If this event were the only factor, however, it would be reasonable to 
have expected more of the other six foils which have selection ratios of 
ereaver than .50 inithis) clustes, or in & limittied number of other 
clusters. in fact, they occurred in four of the clusters. 

Of the five foils in this cluster three of them were in items 
which occur in a common right answer cluster (C,). Mis finding was 
suggestive, once again, of a right-answer wrong-answer interpretation, 
but insufficient to lead to an unambiguous interpretation of the present 
cluster under discussion. 

Also, three of the five foils in Table 24 were classified as OG 
8D 


(Overgeneralization, i.e. 1D 17D, ) and a fourth one as O (Other, 


1s evi 


4 


de» 22D.) which means it could be treated as an OG as well. Thus, the 
advance classification seemed to be the most promising basis far 
interpretation, making it necessary to re-examine the classification of 
2D. for its logico-semantic relationship with the stem. 


- 


item 2: Which of the following factors is the most important 
causitive factor of contempt for stupidity? 


2D, Compulsory school attendance. (Sub) 

Probably the best procedure in the analysis of this foil is to 
use a Venn Diagram, as given in Figure 3 pagzewe 3s 

Figure 3 shows clearly that "contempt for stupidity" and 
"compulsory school attendence" (Foil 2D, ) are disjunctively related. 
For a factor to be "causitive" it must be either conjunctively or 


implicatively related to the other factor. Either conjunction or 


implication is a necessary but not a sufficient condition for causation. 
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FIGURE 3 
VENN DIAGRAM ILLUSTRATING FOIL cD, 


A. Compulsory school attendance. 


Be Contempt for stupidity. 


It would be reasonable to suggest that the student who replaced this 
disjunctive relationahip with a conjunctive relationship, (ignoring the 
fact that either can occur without the other) is substituting one 
category for another. It was upon this basis that this foil (2D, ) was 
originally classified as a Sub (Substitution). 

On the other hand, if the student considers the relationship as 
implicative, (i.e. compulsory school attendance can occur without 
contempt for stupidity but not vice versa) this interpretation could be 
considered an OS (Oversimplification). The thinking in this case 
involves proceeding from an entire set to a subset of that entire set 
as indicated in the use of Arrow #1 in Figure 3. 

If the student begins with the conjunctive subset and extends 
this to include all cases of contempt for stupidity, the thinking 
process would follow the path of Arrow #2 in Figure oy In such. 2 case, 


(proceeding from a subset to an entire set) the most reasonable 
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interpretation of the thinking is 0G (Overgeneralization). These 
arguments suggest the importance of the way in which an item is 
interpreted with respect to the way in which a foil should be classified. 
As gust shown, this foil (2D, ) can be reasonably argued to have at least 
three possible classifications depending upon the interpretation placed 

- upon. the foil by the examinee. 

A decision rule was needed to deal with foils which might 
reasonably be classified in several different categories. Where a 
cluster seemed, in general, to reflect one category of foil, and the 
same interpretation was one of the possible classifications of the 
ambiguous foil, then this classification was assumed to be an appropriate 
interpretation of the ambiguous foil for the particular group on which 
the cluster analysis was conducted. The most important characteristic 
of this rule was the requirement that the interpretation of a foil 
should probably not be generalized beyond the group of examinees upon 
whom the interpretation was established. 

Hence, since 2D, could be an OG, this wrong answer cluster could 
be interpreted as an OG cluster for Group A. 


Wrong answer cluster W, was eliminated before logico-semantic 


2 
aoe yeie on the basis of the fact that all ot the fo1is an this elusicr 
had a selection ratio of less than .06. 

For Cluster W, (see: 2D. 215) there 1s no common content. Items 
4, 13, and 6 comprise all the items in right answer cluster C, but these 


3 


items represented less than half of the members of Woe Only one of the 
foils has a selection ratio of more than .50, and both OG and CM have 


two representatives from the advance classification of foils. Hence, the - 


interpretation of this cluster could not be established by any of these 


a 


on na * 


. J “ ; a et 
63 4 i, itr 1c Pe ED Mh 


iit Ki guesi of 








‘ast ler 
tues { ais) -Ta0t ee «sail 


| Fi ah De = hea) <a ies a § 
vege sant ibeealo oldies aan 
: et ae a > Aa 
ainexs afd ud. fio? age 

is = 


a 9 ’ "wu = 7 a 




















- sO a6 — 


tiusals od: \Lifeno 228 oT | 


4 


o 4 - 
"oar es. ee —- 
;léToiey, Ni ,OOMSSE ISTE © 


: - - Pe - a 
cw im iiaterareda i ome 

ry Sona 

: i vp - 7 

tixtt egeld , £2 al a ioe 


= ® = 
@ r ~@avwed a 3 
=e } nore 
5 ¥ es “ =e , 
ot} to etead ‘ont jae eur : 
- - a # : = 
to of ‘Sie ashioslon 4 
.%: 
me he ; 







TABLE 54 


INFORMATION FOR THE INTERPRETATION 


OF WRONG ANSWER CLUSTER W, 





ee ee) 
oS Se SS ee er wa a eS 





Advance 

. Foil Content Classification yn 
ee ee es Ee ee WAL NeCGeety Sanches contempt 

4D, Stupidity OG aK) 

cod, Progress 0 pale 

5D, Awareness CM BOL 

éD,, Awareness Sub oe 

14D, Stupifity, Awareness OG o40 

13D, Awareness CM 007 

290, Discipline, Progress OS BES 

methods. 


In the ensuing discussions a possible common element among the 
foils in a cluster is suggested. The clue which was used in this case 
was the presence of OG, OS, and CM (Common Misconception) in the same 
cluster of which OG and CM were the most frequent. Powell and Isbister 
(1969) found a polarity between OG and OS on the one hand, aes CM on the 
other. This polarity in a factor matrix indicates a significant negative 
correlation between the poles. Since this polarity, at least in part, 
could have been an artifact of the mutually exclusive selection of 
responses within items, it is reasonable to suggest that OS, OG, and CM 
foils may be related. In addition to using the "most frequent" rule, it 


might be possible to reinforce the interpretability of this cluster as 
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CM if most of the remaining foils had CM as one of their possible 
alternative classifications. 
To begin with, foil 26D, as a member of the O category can have 
any interpretation which is reasonable for the remainder of the foils. 
A discussion of the others follows: 
Item 4: The author, in charging that "society teaches contempt 
for stupidity and fear of being regarded as stupid" by 
means of the school, is assuming that: 


4D, The school should not be an enforcing arm for the customs 
of society. (0G) 


This foal asiwrong on twotcounts je Im ithe’tirst place it isva 
conclusion rather than an assumption made by the author. Second, it is 
overstated by containing a value judgment which may be unwarranted. 
This second treason led stomits (0G ‘classification.  “1t is, “however, in 
addition, one of the ways of phrasing a very common argument against the 
establishment of parochial schools. In this latter context it could be 
a CM foil as well. However, this argument is shaky at best, and would 
not be likely to extend beyond the context of the group upon which this 
cluster was established. 

Item 29: On the basis of the foregoing which of the following 

seems to be the most important consideration when 


preparing anecdotal records on progress reports? 


29D Make no attempts at interpretation since your judgments 
are probably biased. (CM) 3 


29D, Be as brief as possible, giving no information which may 
cloud the central problem. (0S) 


Foil e970, was dropped on the basis of its low selection ratio. 
Since in the advance classification two foils in the same class in the 
same item were not entertained, and since the "brief as possible" part 


of 29D, is an Oversimplification, foil 290, was originally classified 
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as an OS foil. However, the phrase "which may cloud the central problem" 
a LO} 29), is similar in its central idea to "your judgments are 
probably biased." This idea being another way of saying that a person 
should be as objective as possible. Students who emphasized this idea 
in their thinking could be responding more to the second part of the 
statement than to the first (OS) part, making CM (Common Misconception) 
a reasonable alternative classification for this foil. Of course, there 
is the problem of the "relevancy of details" which this foil may also 
raise. It is, however, better to put in details which the writer may 
think irrelevant and an independent observer may not than to require the 
independent observer to infer these details from the context because of 
their omissions. This aspect of the discussion leads to another common 
misconception, namely that the simple act of speaking or writing has 
produced a successful communication. 


Item 6: The purpose of developing a generalized intellectual 
awareness is to: 


6D Stimulate thinking ability within the individual's chosen 
field. 


The confining of thinking ability to "the individual's chosen 
mieldUerseioice., [+ Lelassubstitution tor the phrase “which as not 
contextually bound.'"' Once again, however, the frequency with which this 
misconception is encountered suggests that the foil could be regarded as 
ach foil as well asa oub* iol. 

Item 14: Which of the following best describes the probable 

relationship between contempt for stupidity and 


generalized intellectual awareness? 


14D Contempt for stupidity.should be reduced and awareness 
3 should be increased. (0G) 


This foil is clearly an OG foil since it adds an unwarranted 
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value judgment to an otherwise correct statement. Should the 


definition of CM foils be extended to incorporate this characteristic? 
In any event, six of the seven foils in this cluster could be assigned 
fairly reasonably to the CM class, making the CM interpretation of the 


entire cluster plausible, if not reasonable. Table 55 on Cluster Wi 


follows. 
PA Bia oo 
INFORMATION FOR THE INTERPRETATION 
OF WRONG ANSWER CLUSTER Wy 
Advance D 
Rol Content Classification n 
30D. Summary of RT aie) 


all passages 


Wy as a single member cluster, should probably be dropped 
unless a good reason for retaining it can be found. As an approach to 
the problem of classifying this Cluster Wy the fate of other RT foils 


proved helpful. Foil 11D, was originally in this same wrong answer 


3 


cluster (W,,) but was dropped because of its low selection ratio. Foils 


13D. and 15D, were both in wrong answer cluster Wos and this cluster 


3 EB 


remained uninterpreted. The separating factor between 15D;; 27D, ond 


=: 


30D, may have been on content lines since each of these are from 


3 : 


different reading selections. Foil 29D 


2 


was also dropped because of its 
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low selection ratio. Foil 13D, had similar but not identical background 


3 


to 30D, but was dissimilar to 15D, as to content, which was part of the 


3 


classification problem of a The other single member cluster (Wo) 


contains foil 29D, which was originally classified as CM but could also 


vf 
besan Rio iif the tort an W 0 as an KT. the séeparavion is “om a-comvent 
basis. The low average total correct scores for these two foils (11.1 


and 11.0) may be taken as equivalent except for conveny; Nende, tits 


cluster (w,,) was retained as an RT. Table 56 on Cluster We follows. 


TABLE 56 


INFORMATION FOR THE INTERPRETATION 


OF WRONG ANSWER CLUSTER We 





Advance D 
Foil Content lassification n 
Leb, Aggression Inv oL0 
11D, Aggression Os +30 
12D, Aggression OG ee 
19D; Progress 0 14 
20D, Progress ie) oon 





By the rules used thus far, except for the logico-semantic 
interpretation, Cluster Y should be classified as 0O (Other) since this 


is the most frequently occurring equivalent advance foil class, The 
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unusual element in this cluster, however, is the fact that there are two 
foils from the same item in this cluster. This event qlanerye: the 
assumption behind the rule for classification discussed earlier on 
page WG Ae Die 

bince the reclassification of any one of foils Te, 11D., er 
Led, to the same category as either of the others would give that 
classification to four out of the five foils in this cluster, and since 
content, the size of the Do and the distribution in right answer 
clusters of the corresponding items do not account for this cluster, a 
logico-semantic analysis of the two foils in item 12 would seem 
reasonable. 


Item 12: Aggressive behavior in female children is: 


12D, More unpredictable and expressed differently than in males 
of the same age group. (0G) 


12D Less differentiated in expression than in males of the 
same age. (Inv) 


Foil fed, was classified fe Inv (Inversion) because namie) 
opposite to a true statement. It is not opposite -to the actual correct 
answer but to a statement that could have been used as an alternative 
correct answer if the examiner had so chosen. On the other hand, Led, 
was classified as OG because the first two words (more unpredictable) 
form an incorrect statement added to a correct statement. However, 
these two words are incorrect by virtue of being opposite to the truth 
(Inv ) when the restriction "of the same age group" was applied to this 
statement. 

This last property reinforces the Inv (Inversion) classification 
of this foil. It may be argued that two Inv foils were possible in 


Item 12 because of the complex logical structure which this item 
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required before it could be answered. (See: the discussion of right 
es cluster C, on pages 118 to 120). 
Item ll: Overt aggression would likely be decreased by: 
11D, Blocking many modes of Aggression. (Inv) 
11D, Lessening the threat of punishment. (0S) 


2 

Lessening the threat of punishment, or permissiveness, does not, 
by itself, either increase or decrease overt aggression. It is on these 
grounds that this foil was classified as an OS. Overt aggression will 
be likely to increase if permissiveness develops frustration. * The 
energy of the children must be channelled into alternative directions if 
overt aggression is to decrease in a permissive setting. Thus, it is an 
oversimplification to say that aggression is likely to either increase or 
decrease in a permissive setting. However, lessening the threat of 
punishment, in the absence of alternatives, will probably increase overt 
aggression. This foil could have been classified as an Inv if it were 
not’ for the fact that 11D, was already so classified. An alternative 
possible *classitication for 11D, is given with the discussion of We 
(see: De 227 ye 

Perhaps the reclassification of 11D, as Inv is a bit tenuous. 
Mme -other“four foils in this cluster are not on such shaky ground, so 
that it is reasonable to reclassify wrong answer cluster Ms as inv. 
Information for the interpretation of Wrong Answer Ciuster We appears 
irevav les5/ 7on “pase S222 6 

Short of a close logico-semantic analysis of the characteristics 
of the foils in Table "57, there is*no clear basis forthe anterpretation 
of cluster Wes 
It should be pointed out that the fact that these foils have 
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TABLE 57 


INFORMATION FOR THE INTERPRETATION 


OF WRONG ANSWER CLUSTER W 











6 

Advance D 

Foil Content Classification n 
7D, Aggression Inv ee 
25D, Progress Sub 24 
21D, Progress O oy e5) 
7D, Awareness OS 5 a. 
26D, Progress Irr aN 





formed a cluster has led to the assumption that there must have beon a 
common logico-semantic element in these foils, at least so far as the 
members of Group A were concerned. The argument for the formation of a 
new class of foil (namely: Non Sequitur--NS) which follows should not 
be construed to deny the plausibility of the original classification of 
the foils in this cluster but only as to the inappropriateness of these 
classifications for this particular group of examinees. By this point 
in the study, it had become clear that the foil categories were not 
mutually exclusive, and that it seemed possible to classify at least 
some foils in several different ways (see: 2D)» Be @l3L ofS) ieeThush tie 
foil interpretations presented here most probably apply only to Group A. 


Item os The basic position of the author in writing about 
aggression is that it; 


9D Can be eliminated through the process of socialization. 
(Inv) 
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On the contrary, Kagan and Moss (1962) assume that aggression is 
one of several innate behavior systems. Being innate makes its elimina- 
tion impossible. However, the socialization process can channel erat 
sion away from its more destructive aspects. A number of classifi- 
cations of this foil are possible. Once again, these classifications 
are dependent upon several possible logico-semantic distinctions which 
can be made. Most simply, the foil does not follow from the data, i.e. 
it is a non sequitur relationship. There was, however, no such classifi- 
cation ani thevadvance classifications ntThes .NS (non sequitur) category 
was eliminated, as indicated above, on the basis that it displayed 
experimental dependencies with CM foils in the Powell and Pune: 
(1969) study. 

Item 25: The most useful suggestion to help Chester is: 


29D: To give Chester personal warmth, acceptance, and support 
wherever it is appropriate. (Sub) 


As the right answer indicates, Chester needs “concrete help in 
getting started on specific tasks." In other words his problems would 
seem, from the progress report, more developmental than emotional. The 
procedure given in 25D, substitutes a treatment procedure designed to 
deal with emotional problems for a procedure designed for developmental 
problems. By itself, the use of 25D, is simply inappropriate. Re- 
classifying 25D, into a new NS category would seem to be reasonable. 

Item 21: Chester is growing very slowly and really quite 

immature for his grade. Everyone expects too much of 
him. 

21D, This hypothesis is refuted by the facts. (0) 


On the contrary, the only information directly available about 


Chester's physical development is the number of days he has been absent 
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or tardye This limited information is insufficient concerning the 
physical areas of development given in Item 21 to form any Siencenens: 
He shows some indications of specific academic immaturity, and perhaps 
some social immaturity, but this is all. As to whether or not the 
expectations made of him are unreasonable, it cannot be decided from the 
information given. The only expectation statement made about his work 
re Ulsenoteready forwdiviston"ewhichedoesimot sound like a statement of 
overexpectation., 


Of course, 21D, could have been reclassified without this 


3 
analysis because of its 0 (Other) advance classification. However, 
its relationship to the correct answer supported the use of an NS 
category for this foil. elD, was dropped for underselection, and 21D 


formed part of one of the two unnamed new categories to emerge from this 


study. (W This information also implies the reasonableness of the 


‘ee 
137 
formation of a new NS category for wrong answer cluster Wee 


Item 7: Of the following the best example of generalized 
intellectual skill is: 


7D, Applying abstract principles to new situations. (08) 


3 


Comparing 7D, with the "correct" answer "The widely applicable 


3 
technique .of logic" showed this foil to be without question an OS foil, 
since it is a true statement of narrower generality than the correct 
answer. As a true statement it cannot be classified as an NS, which 
leaves the interpretation of this cluster based on this ivem in some 
doubt. Any alternative interpretations such as involving its large 
selection ratio (.51) or suggesting a tenuous link between NS (Non 


Sequitur) and OS (Oversimplification) would be premature at this stage.. 


Cross validation data helps to clarify this cluster somewhat. The fact 
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that this foil moves to wrong answer cluster W., in Group B which is most 


? 


prominantly composed of members from W (the wrong answer cluster from 


i 


Group A which proved to be unclassifiable) relieves the problem of 
interpreting this cluster somewhat, but does not solve it. Foils 7D, 
and 250, moved together, as did 21D 


and (Ds. while 26D, migrated by 


3 3 

itself to a new cluster. This cluster held together better than any other 

in cross validation with-the exception of Wy 3° 

Item 26: If additional information on Chester is desired and 
none of the following has been attempted, which one 
would provide the greatest amount of immediately 
useful information: 


26D, A request for the assistance of a guidance counselor. 


; (ter) 

The D. (Seleeuwoneratiio) on this foil isM.13, it might be 
interesting to know who made these selections. Most of the examinees 
in this group were practicing teachers and they would know from experi- 
ence that the usual information from the guidance counselor would merely 
reinforce what they already knew and not add much further useful 
information. In the absence of an NS category, Mobis foil asbelearly an 
(ares, 

ODsthestive tolls in this ’clusver. Tour. could have peen 
classified quite measonably as Ne 201s, The fat tir one (7D) could not 
have been so classified meaning that this cluster could not be unambig- 
uously classified. However, the best interpretation for this cluster 
within Group A is to establish a new class of foil, namely: Non 
Sequitur (NS): This type of foil is a wrong answer foil by virtue of 

the fact that it simply does not follow irom the siven 
information. As a rule some part of the foil stands in 
direct: contradiction to the logical srmuciurs oT 


connotative meaning of the background information (or 
part thereof) required to answer the question. 
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Wrong answer cluster We was classified as containing NS foils, 
at least so far as Group A was concerned. Table 58 giving information 


for the interpretation of wrong answer Cluster W, follows. 


i 


TABLE 58 


INFORMATION FOR THE INTERPRETATION 


OF WRONG ANSWER CLUSTER W 


i 

| Advance 
Foil Content Classification n 
LOD, Aggression CM oe 
19D, Progress O eu? 
4D, Stupidity Irr ore 

13D, Stupidity, Awareness ; 
Aggression RT 24 
1eD, Aggression CM Bale 
15D, Discipline RT sO0 
18D, Discipline irr >» 86 
26D, Progress Os 034 
8D., Aggression Inv sie 


The informatwoman TablemSo) provides nojclear basis Upon which 


to interpret wrong answer cluster W It contains two CM's, two RT's, 


Ven 
and two Irr's. In general, the lower Dias relate to the selection on 


aggression (26D, is the exception). 


No detailed discussion will be given for this group. One 
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illustration is! sufficient.y" Foils 10D, and Led, could conceivably be 
reclassified as RT foils on the basis of logico-semantic analysis, as 


could 19D, by the "0" rule. However, such a reclassification is un- 


4 
reasonable for foils 4D, 18D, 26D., and 8D,. Similar findings 


2 
occurred for other pairs in’ this wrong answer cluster. 

{The inability to interpret this cluster led to its being dropped 
from further relevant analysis. This was somewhat unfortunate since 
several of these foils have a high selection ratio.meaning that a fair 
amount of information was lost by this decision. Nonetheless, it is 
reasonable to argue that if a wrong answer cluster cannot be given an 


adequate label, it should not be used. Table 59 giving information for 


the interpretation of wrong answer cluster We follows. 


TABLE 59 


INFORMATION FOR THE INTERPRETATION 


OF WRONG ANSWER CLUSTER W 








8 
Advance D 
Foil Content Classification n 
11D, Aggression Inv nu 
28D, Stupidity, Progress TA 006 
14D, Stupidity, Awareness Sub S150 





Table 59 shows that Cluster We also seems to be ambiguous. 
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Item ll: Overt aggression would likely be decreased by: 

11D, Blocking of many modes of aggression. (Inv) 

There are several ways in which this foil can be interpreted. 

The blocking of modes of aggression would not reduce overt aggression 
except in those specific areas where the blocking occurred. Overt 
aggression would increase in areas where the blocking was absent or less 
effective, or more socially acceptable to the peer group. If the 
blocking increased the frustration or aia absolute incidence or overt 
aggression would also increase. For this latter reason, (a. 362 since 
blocking would probably have the opposite to the desired effect) this 
foil was classified as an Inv. However, to arrive at this conclusion as 
being correct the examinee must make the invalid assumption that attempts 
to regulate overt behavior also regulate innate drives. This foil could, 
thewefore, .be classified as an IA foils This decision gave the cluster 
a majority of IA foils. If IA was also a reasonable alternative for 
14D, then IA would be a reasonable classification for this cluster so 
far as Group A was concerned. 

Item 14; Which of the following best describes the probable 
relationship between contempt for stupidity and 
generalized intellectual awareness? 

14D, Either will increase with an increase in the other. (Sub) 

Foil 14D, was one of the foils about which the raters showed 

considerable disagreement. It was classified as Sub because part of the 
relationship was wrong, and it did not seem to relate to the other Inv 
foils inethe pilot study.) This relationship could aiso be en inv, 
because a direct relationship is logically opposite to an inverse 


relationship. Powell and Isbister (1969) encountered the same problem 


in logical relations (as compared to logical fallacies) type foals. On 
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the other hand, this foil could wrongly be considered correct if the 
examinee forms an invalid assumption relating Veritical’ Thinking wi tn 
contempt for stupidity by defining stupidity in terms of uncritical 
thinking. 

Other possible classifications were given to this foil with 
Similarly tangled arguments. justifying each rater's conclusions. 

Two of these foils can be reasonably regarded as IA foils, the 
third could well be stretching the point. In any case, a reasonable 
over-all classification for this wrong answer cluster would seem to be 
IA. Table 60 giving information for the interpretation of wrong answer 


cluster Ye OL DOWS.. 


TABLE 60 


INFORMATION FOR THE INTERPRETATION 











OF WRONG ANSWER CLUSTER Wo 

Advance D 

Foil Content Classification n 
3D, Stupidi cy os pal) 

28D, Progress Hinge 25 
10D, Aggression Inv 06 
17D, Discipline Sub 09 
270, Progress 0 06 
5D Awareness OS giz 
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In Table 60 the two OS's and the O foil account for half of the 
forls in. Clisten Wo malane en, 0S classifacation,of this mncter within 
the rules based on the advance classification. It would be better if the 
other three could be alternatively classified as OS. 
Item 28: In Chester's progress report, which one of the 
following is the most important factor contributing to 


his difficulty with school achievement? 


28D, His ability to develop a generalized intellectual 
awareness. (Irr) 


This foil was classified as Irr on the basis that in Grade 6 most 
children are still too young to have progressed very far into "Formal 
Operations" which form the basis of generalized intellectual awareness. 
This classification assumes that the examinees know this information 
about development which is an unreasonable assumption. In the absence 
of this information, Chester's problems also involve decoding skills in 
reading and to a lesser extent in arithmetic. Hence his problems involve 
Wome shen eliLSs Woml—suesests, and the foul might, for this reason, be 
reclassified as an OS foil. 


Item 10: With which of the following statements concerning 
aggression would the author be most likely to agree? 


10D, Aggression generally interferes with the attainment of 
educational goals. (Inv) 


This foil was classified as an Inv because the best answer was 
'tAgeression is potentially useful for educational purposes." Superfi- 
cially LOD, would seem to be opposite to the right answer. On the other 
hand, aggression can interfere with the educational process. Aggression 
expressed in the form of competition may be a useful form of intrinsic 


motivation. The term "generally" in the foil overstates the case for 


the negative aspects of aggression whereas the foil itself understates 
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Pol 
the total picture. This foil might be classified as either OS or OG 


depending on how the examinee looks at the item. 


Item 17: From the above passage we can infer that Doris! 
leadership of the group was: 


17D, lLaissez-faire. (Sub) 


3 

Mie reason Lor=classityine™ thas forl as Sub was that Doris! 
attempts to coerce were ineffective, so she let the teacher take over. 
As a result her later leadership was laissez-faire but only under the 
arbitrary intervention of the teacher. As pointed out in the original 
discussion of this item, even this argument is stretching the point. 
(See: Pd- 72278 |S There is no similar plausible argument which might 
make this foil an OS. 

Five of the six foils in this cluster can be included in the OS 
categories hence the advance classification of this wrong answer cluster 


is retained. Information for the interpretation of wrong answer cluster 


follows in Table 61 


TABLE 61 


INFORMATION FOR THE INTERPRETATION 


OF WRONG ANSWER CLUSTER W 
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Advance D 
Foil Content ClaseataGatuon n 
29D Discipline, Progress CM 08 
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EXSY 4 

Cluster W, 6 as given in Table 61 is the second of two. single 
member clusters. It contains only e7D, which was originally classified 
as a (M for the reasons already given (see: pp. 180,181). However, the 
fact that it did not occur in a common cluster with the other CM foils 
raises some doubts over this classification within the confines of 
Group A.) it is possibile, Jhowevyer, fo classifyvethrs foil into other 
categories. 

Item 29: On the basis of the foregoing which of the following 

seems to be the most important consideration when 


preparing anecdotal records or progress reports? 


29D. Make no attempts at interpretation since your judgments 
are probably biased. (CM) 


The suggestion in this foil that all interpretations are 
sufficiently biased so as to be of little value has the effect of re- 
defining the term "interpretation" to mean "biased interpretation." 

In this case, it would be necessary to assume that the other 
single member group (W),) which is also an RT (Redefinition of Terms) 
split from this one along content lines. The obvious vocabulary-content 
linkage in both of these foils makes this conclusion reasonable. Foil 
29D, was reclassified, therefore, as RT (Redefinition of Terms) making 
Cluster Wo an RT cluster. The apparent content binding of some mis- 
reading type foils would seem reasonable, since there seems to be a 
parallel group of foil levels to the right answer levels in Bloom's 
Taxonomy. Bloom's description of the levels in the Taxonomy suggest a 
steady progression away from context as the level of the categories 
increase. For this reason, it seemed reasonable to combine Ws with 


0 


W, as an RT cluster in subsequent analysis, rather than to discard both 


L 


clusters because of their small size. 
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Table 62 gives information for the interpretation of wrong 


answer cluster Wa4° 


TABLE 62 


INFORMATION FOR THE INTERPRETATION 


OF WRONG ANSWER CLUSTER Wy 4 


| agvance 
Foil Content Classification n 
18D, Discipline Sub rpg 
7D, Awareness WW ood 
6D, Awareness Gore Ape) 
16D, Discipline cM 214 
2D, HEuprarcy OS O09 
10D, Aggression WW ise) 


In Table 62 the most frequent foil category in Cluster W Ss 


: 11 % 
WW (Word-Word Link) which serves by the decision rules to identify this 
cluster. 

A logico-semantic analysis of the other foils might reinforce 


this classification. 


Item 18: From the description of the incident, we can conclude 
that the teacher's handling of the incident was; 


18D, Good: she intervened to prevent a serious conflict from. 
continuing. (Sub) 


The reading selection states: ‘There seemed to be confusion in 
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this group so I decided to investigate." The Similarity between this 


statement and the phrase "intervened to prevent" in the foil is self- 
evident. WW would seem to be a reasonable alternative classification 


por wthie ton, 


Item 6: The purpose of developing a generalized intellectual 
awareness is to: 


6D, Give the individual an ever-widening view of his world. 
(ae) 


The similarity between the phrase "ever-widening view" in this 
foil and the phrase "free-ranging understanding" in the reading 
selection warranted the use of the alternative class of WW for this foil. 


Item 2: Which of the following is the most important causative 
factor of contempt Por stupidity. 


cD, Compuisory written examinations. (0S) 
Te phrase “compulsory written examinations’ occurs in both this 
foil and the reading selection. 
Item 16; tf the teacher had written "Doesn't work well with 
others," as an anecdotal record for the above incident, 


this would have been: 


16D Worse: Teachers are failing in their obligations in not 
supplying complete information. (CM) 


Tiere 16 no Samilar connection in foil 16D, between the stem or 
the reading selections to the ones presented above. The above discus- 
sion, nonetheless, in general supports the retention of WW as a 
reasonable interpretation for this wrong answer cluster. 

Since, as shown in Table 63 (see: p. 235) both of these foils 
(19D, and 24D, ) are classified as O and both come from the same content, 
there may be some doubt about the interpretation of this cluster. 


Qne obvious course of action with this foil cluster would be to 


drop it from further analysis. On the other hand, the foil classes in 
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TABLE 63 


INFORMATION FOR THE INTERPRETATION 


OF WRONG ANSWER CLUSTER W. 











12 
Advance D 
Row. Content Classi ication n 
19D, Progress O 209 
24D Progress @) “07 





the Guidelines made no pretence at being exhaustive. Since the 
experience with other foil clusters has been that logico-semantic 
snaiysis has often revealed a possible common base for interpreting 
POLIS within a parteeular cluster, such an analysis for this cluster may 
assist in the illumination as to how the list of Guidelines might be 
extended. For this reason, these foils were also analysed. 

To begin with, the fact that they stood separate from any of the 
interpreted categories suggests that these two foils may form a foil 
class, the basis of which has not yet been determined, but possibly 
related to the special format of atems! 19 te 29 inclusive. 

Items 19 to 24 inclusive used a classification protocol designed 
to get the examinees to treat the statements represented by these items 
as hypotheses to be tested on the basis of the information in the reading 
selection from Prescott (1957) concerning Chester's Progress Report. The 
categories were: | 


1. supported. 
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Ae implied, 


3. refuted. 

4, insufficient evidence. 

In these categories there is a hierarchy of inferential support 
from insufficient evidence, to implied, to support. 

The "correct" answer from item 19 is "implied" and the 
re chats answer given in 19D, was "supported." Similarly, the 
“correct" answer for item 24 was inant erent evidence'' and the 
corresponding answer in this cluster was "implied." Each answer given 
in these foils was one step higher in the hierarchy than the "correct" 
answer. This "overstatement" was not the same as "overgeneralization" 
as defined in this study. Whether or not such a relationship is 
exclusive to this type of question remains undetermined. Rather than 
premature naming, the O (Other) classification of this cluster was 
retained. Since another cluster was interpreted as O (i.e. Wy3): a 
subscript was applied for purposes of distinguishing between these two 
clusters. 

Once again, Cluster W153 as given in Table 64 (see: p. 156) is 
an O classification based on the advance foil classification. Te 
logico-semantic analysis of this cluster proved to be very interesting. 
A slight change of format will be used in this discussion with all 
three items being presented before the discussion of them as a cluster. 


Item 21: Chester is growing very slowly and is quite immature 
for his grade. Everyone expects too much of him. 


eld, The hypothesis is supported by the facts. (0) 
Item 28: In Chester's progress report, which one of the 
following is the mest important factor contributing to 


his difficulty with school achievement. 


28D. His weakness in reading which is affecting ail areas of 
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learning. (0G) 


Item 23: Chester's reading deficiency has not yet begun to 
affect seriously his performance in other areas. 


23D, The hypothesis is refuted by the facts. (0) 


TABLE 64 


INFORMATION FOR THE INTERPRETATION 


OF WRONG ANSWER CLUSTER Le 











Advance D 
Foad Content Classification n 
eld, Progress ©) 007 
28D, Progress OG 8/4 
eo. Progress ie ALi 


~~ 


As it happens, the hypothesis in item 23 is supported rather 
than refuted by the facts. Notice, however, that this same contrary- 
to-fact conclusion is stated in foil eo), and implied in the response 
to item 21. The positive statement of this false conclusion was most 
frequently selected, (i.e. 28D, for which the D, was 032). The negative 
statement (21D,) was less frequently selected (D., = eli) eft the implied 
statement (21D, ) least frequent (D, = .07). This would seem to be 
reasonable. Also, this cluster holds together better than any other in 


cross validation (two of the three form a new three-member cluster). 
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It would seem, then, that this cluster is content bound, not so 
much on the basis of a common reading selection, but rather hon a single 
common contrary-to-fact conclusion formed by some of the examinees. 

The O (Other) designation was therefore, retained making this 


wrong answer cluster O56 Table 65 giving information on Cluster W 


14 
follows. 
TABLE 65 
INFORMATION FOR THE INTERPRETATION 
OF WRONG ANSWER CLUSTER Way 

Advance D 

Foil Content Classification al 
3D, Stupidity | Irr Be 
£5), Progress iireie Pe] 
oD ’ Aggression Sub Bp ky 

(s 

e7D,, Progress hes 09 








Wrong answer Cluster Wy y contained Irr foils in three out of 
Cour cases. ploil ID, cannot: be reclassifiedgas an Grr on since: sos 
not a true statement. However, the Irr classification of this cluster 
was retained on the basis of the decision rules already discussed 
(see: p. 237). Table 66 giving information for the interpretation of 


wrong answer cluster Was follows (see: pe 239). 
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TABLE 66 


INFORMATION FOR THE INTERPRETATION 


OF WRONG ANSWER CLUSTER W 

















As 
Advance D 
Foul Content Classification n 
30D,, Summary of all 
$ Selections Jui Ae i 
24D. Progress O 006 





Wrong answer cluster W,_, as given in Table 66, could not be 


a) 

interpreted without logico-semantic analysis because it did not contain 

a most frequent category by the advance foil classification. A rea- 

sonable approach would be to investigate the possibility that 24D, might 
5) 


galso be classizied as Tr. 


Item 24: Chester's mother has kept after him about his reading 
until he hates it. 


24D, The hypothesis is refuted by the facts. 

Several facts can be derived from Chester's "Progress Report" 
which have a bearing on Item 24. The three most important are: 

1. “Does not enjoy reading." 

2. "Has a wide speaking vocabulary." 

Oe  Minigve: spel linc. 

From these three facts two conclusions can be drawn: 


1) Chester's background must be fairly verbal, hence his 


reading deficiency is probably not the direct product of a 
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background disadvantage. 

2) Since he likes spelling and he "is better able to find facts 
than to interpret facts," his reading problem would seem to 
be a decoding problem, 

In addition, the report gives no direct evidence about Chester's 
home background, Hence the best answer for this question is "insuffi- 
cient evidence." In order to have the proposition presented in Item 24 
refuted by the facts more would have to be known about the probable 
sources of the decoding problems and the side speaking vocabulary. Only 
if these two considerations were emphasized beyond reason could the 
refutation be acceptable, This conclusion might be expected to have 
arisen, then, from a reordering of the emphasis given to parts of he 
reading selection vis Z vis other parts of the selection. The classifi- 
cation of 24D. as Tr would seem to be appropriate in this case, giving 


the entire cluster-—this same interpretation. 
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